Articulatory Phonology: An Overview* - Haskins … thanks to Elliot Saltzman and Michael Studdert...

20
HasklTk'i Laboratones Status Report on Speech Research 1992, SR-111! 112.23-42 Articulatory Phonology: An Overview* Catherine P. Browman and Louis Goldstein t INTRODUCTION Gestures are characterizations of discrete, physically real events that unfold during the speech production process. Articulatory phonology attempts to describe lexical units in terms of these events and their interrelations, which means that gestures are basic units of contrast among lexical items as well as units of articulatory action. From our perspective, phonology is a set of relations among physically real events, a characterization of the systems and patterns that these events, the gestures, enter into. Thus, gestures are phonologi- cal events in the sense of Bird and Klein (1990). While gestures are primitive phonological units, they do not correspond to either features or seg- ments. Rather, they sometimes give the appear- ance of corresponding to features, and sometimes to segments. The issues discussed throughout are intended, among other things, to help clarifY the differences among gestures, features, and seg- ments. In addition, we will emphasize the follow- ing point throughout this paper: gestures and ges- tural organization can be used to capture both categorical and gradient information. Section 1 will present an overview of articulatory phonology, touching on a number of key aspects. Sections 2 and 3 will expand on examples in which a gestural analysis appears particularly fruitful. We will end in Section 4 with a discussion of how articulatory gestures provide a felicitous framework for deal- ing with language development. Our thanks to Elliot Saltzman and Michael Studdert- Kennedy for critiquing a preliminary version of this paper, and to Lisa Zsiga and Simon Levy for help in manuscript preparation. This work was supported by NSF grant BNS 882099 and NIH grants HD-01994 and DC-00121 to Haskins Laboratories. 23 1.1 Gestures as dynamic articulatory structures Gestures are events that unfold during speech production and whose consequences can be observed in the movements of the speech articulators. These events consist of the formation and release of constrictions in the vocal tract. To help in explicitly modeling these events, gestures are defined in terms of task dynamics (Saltzman, 1986; Saltzman & Kelso, 1987; Saltzman & Munhall, 1989). Task dynamics has been used to model different kinds of coordinated multi- articulator actions, including those involved in reaching and those involved in speaking. In the case of speech, the tasks involve the formation of various constrictions relevant to the particular language being spoken. Task dynamics describes such tasks using damped second-order dynamical equations to characterize the movements; see Browman and Goldstein (1990a) and Hawkins (1992) for further discussions of the use of task dynamics to characterize speech. One important aspect of task dynamics is that it is the motion of tract variables and not the motion of individual articulators that is characterized dynamically. A tract variable characterizes a dimension of vocal tract constriction; the articulators that contribute to the formation and release of this constriction are organized into a coordinative structure (Fowler, Rubin, Remez, & Turvey, 1980; Turvey, 1977). For example, the tract variable of lip aperture is affected by the action of three articulators: the upper lip, the lower lip, and the jaw. The current tract variables, and their component articulators, are displayed in Figure 1. An individual tract variable control regime is specified in terms of the set of articulators used to achieve a constriction and the values of the parameters in the dynamic equation describing its movement: target (rest position), stiffness, and damping.

Transcript of Articulatory Phonology: An Overview* - Haskins … thanks to Elliot Saltzman and Michael Studdert...

HasklTk'i Laboratones Status Report on Speech Research1992, SR-111! 112.23-42

Articulatory Phonology: An Overview*

Catherine P. Browman and Louis Goldsteint

INTRODUCTIONGestures are characterizations of discrete,

physically real events that unfold during thespeech production process. Articulatory phonologyattempts to describe lexical units in terms of theseevents and their interrelations, which means thatgestures are basic units of contrast among lexicalitems as well as units of articulatory action. Fromour perspective, phonology is a set of relationsamong physically real events, a characterizationof the systems and patterns that these events, thegestures, enter into. Thus, gestures are phonologi­cal events in the sense of Bird and Klein (1990).

While gestures are primitive phonological units,they do not correspond to either features or seg­ments. Rather, they sometimes give the appear­ance of corresponding to features, and sometimesto segments. The issues discussed throughout areintended, among other things, to help clarifY thedifferences among gestures, features, and seg­ments. In addition, we will emphasize the follow­ing point throughout this paper: gestures and ges­tural organization can be used to capture bothcategorical and gradient information. Section 1will present an overview of articulatory phonology,touching on a number of key aspects. Sections 2and 3 will expand on examples in which a gesturalanalysis appears particularly fruitful. We will endin Section 4 with a discussion of how articulatorygestures provide a felicitous framework for deal­ing with language development.

Our thanks to Elliot Saltzman and Michael Studdert­Kennedy for critiquing a preliminary version of this paper, andto Lisa Zsiga and Simon Levy for help in manuscriptpreparation. This work was supported by NSF grant BNS882099 and NIH grants HD-01994 and DC-00121 to HaskinsLaboratories.

23

1.1 Gestures as dynamic articulatory structures

Gestures are events that unfold during speechproduction and whose consequences can beobserved in the movements of the speecharticulators. These events consist of the formationand release of constrictions in the vocal tract. Tohelp in explicitly modeling these events, gesturesare defined in terms of task dynamics (Saltzman,1986; Saltzman & Kelso, 1987; Saltzman &Munhall, 1989). Task dynamics has been used tomodel different kinds of coordinated multi­articulator actions, including those involved inreaching and those involved in speaking. In thecase of speech, the tasks involve the formation ofvarious constrictions relevant to the particularlanguage being spoken. Task dynamics describessuch tasks using damped second-order dynamicalequations to characterize the movements; seeBrowman and Goldstein (1990a) and Hawkins(1992) for further discussions of the use of taskdynamics to characterize speech.

One important aspect of task dynamics is thatit is the motion of tract variables and not themotion of individual articulators that ischaracterized dynamically. A tract variablecharacterizes a dimension of vocal tractconstriction; the articulators that contribute to theformation and release of this constriction areorganized into a coordinative structure (Fowler,Rubin, Remez, & Turvey, 1980; Turvey, 1977).For example, the tract variable of lip apertureis affected by the action of three articulators: theupper lip, the lower lip, and the jaw. The currenttract variables, and their component articulators,are displayed in Figure 1. An individualtract variable control regime is specified in termsof the set of articulators used to achieve aconstriction and the values of the parameters inthe dynamic equation describing its movement:target (rest position), stiffness, and damping.

Browman and GoldsteIn

tract variable articulators involved

LP lip protrusion upper & lower lips, jawLA lip aperture upper & lower lips, jaw

TTCL tongue tip constrict location tongue tip, tongue body, jawTTCD tongue tip constrict degree tongue tip, tongue body, jaw

TBCL tongue body constrict location tongue body, jawTBCD tongue body constrict degree tongue body, jaw

VEL velie aperture velum

GLO glottal aperture glottis

VE.?

velum

+tonguebody

center

+ upper lip

+ lower lip

_ GLO

Figure 1. Tract variables and associated articulators.

+glottis

These parameters provide a kind of internalstructure for a control regime that underlies thespatiotemporal event in all its instances. A gesturein articulatory phonology is specified using a set ofrelated tract variables. For example, in the oraltract the constriction location and degree are twodimensions of the same constriction, and thereforeare considered related tract variables. In Figure 1,related tract variables contain the same firstletter(s) in their names. Note that this means thateach gesture is a local constriction, defined withresped to one of the five tract variable sets shownin the figure (lips, tongue tip, tongue body, velum,glottis).

Gestures can function as primitives of phonolog­ical contrast. That is, two lexical items will con­trast if they differ in gestural composition. Thisdifference can involve the presence or absence of agiven gesture, parameter differences among ges­tures, or differences among organizations of thesame gestures (discussed further in Section 1.2).

This can be illustrated with the aid of displaysshowing the arrangement of gestural events overtime. Lexical items contrast gesturally, first of all,if a given gesture is present or absent (e.g., "add"vs. "had," Figures 2a, 2b; "add" vs. "bad," Figures2a, 2c; "bad" vs. "pad," Figures 2c, 2d; "pad" vs."pan," Figures 2d, 2f). We assume that, in speechmode, the larynx is positioned appropriately forvoicing unless otherwise instructed. Note that"had" and "bad" would typically be considered todiffer from "add" by the presence of a segment,while "bad" and "pad," and "pad" and "pan," wouldcontrast only in a single feature, voicing or nasal­ity respectively. Gesturally, all these contrasts areconveyed by the presence or absence of a singlegesture. Another kind of contrast is that in whichgestures differ in their assembly, i.e., by involvingdifferent sets of articulators and tract variables,such as lip closure vs. tongue tip closure (e.g.,"bad" vs. "dad," Figures 2c, 2e). All these differ­ences are inherently categoricallY distinct.

___________________--:A-'-'-'rt.:.i.:..Cl.:./'.:.a.:.t.:.O:..r1/L. -,-P-,-I:::lO:.:.n:::o..:.'.:.o"'-'SV:L--:•.:..4:.;.n.:..c::0:..:t:..:'e:.:.rL.::.u.:.;le:..:u.:.' 2:;

VEL VEL

TB wide pharyngeal TB wide pharyngeal

IT IT

LIPS LIPS

GLO GLO wide

(a) (b)

VEL

TB

IT

LIPS

GLO

elolabial

wide pharyngeal

(c)

eloalveolar

VEL

TB

IT

LIPS

GLO

elolabial

wide plulryngeal

wide

(d)

VEL VEL wide

TB wide pharyngeal TB wide pharyngeal

eloIT IT alveolar

LIPS LIPS elolabial

GLO GLO wide

(e) (f)

IVEL wide

ITB wide plulryngeal

IT " cnt I I elo Jalvsolar Blveolar

LIPS elolabial

GLO wide

(9)

Figure 2. Schematic gestural scores. (a) "add" (b) "had" (c) "bad" (d) "pad" (e) "dad" (f) "pan" (g) "span."

Gestures can also differ parametrically, i.e., inthe values of the dynamical parameters thatdefine the spatiotemporal structure of thearticulatory event, such as a target value for thetongue tip constriction degree that would lead to acomplete closure vs. a critical value that wouldlead to the generation of turbulence (see gestureson TT tier in Figures 2g, 2e). While suchdifferences are not inherently categorical, we havesuggested (Browman & Goldstein, 1991) thatdistinct ranges of the possible parameter valuespace (for a given articulator set) will tend to beselected by a language on the basis of quantalarticulatory-acoustic relations (e.g., Stevens, 1989)and/or on the basis of adaptive dispersionprinciples (e.g., Diehl, 1989; Lindblom &Engstrand, 1989; Lindblom, MacNeilage, &Studdert-Kennedy, 1983; Lindblom & Maddieson,1988). In addition to target values for constrictiondegree, other dynamical parameters serve todistinguish gestures as well, as discussed inBrowman and Goldstein (1989, 1990a):constriction location target, stiffness (possibly,vowels vs. glides), and damping (possibly, flaps vs.stops, in languages where they contrast).

Another major function of a phonologicaldescription is to represent natural classes. Sincegestures are embedded in the vocal tract, the vocaltract itself acts to organize the gestures into ahierarchical articulatory geometry (Browman &Goldstein, 1989), the levels of which have beenshown to represent natural classes by work infeature geometry (e.g., Sagey, 1986). The majororganizational difference between thisarticulatory geometry and various featuregeometries has been that, in the gesturalapproach, constriction degree (the closest gesturalanalog to continuancy) is low on the tree, in effectdepending from the articulator node and sister toconstriction location (place), whereas in featuregeometries, continuancy has typically been closeto the top of the feature tree. Recent work infeature geometry, however, has begun to lower theposition of continuancy or its analogs such asaperture (e.g., Clements, in press). Indeed, basedon generalizations about the phonologicalbehavior of assimilations in a variety oflanguages, Padgett (1991) proposes thatcontinuancy should be represented as dependingfrom the articulator node, a proposal consistentwith the gestural approach. Such a move of coursesupports the relevance of the gestural unit to theorganization of phonological feature geometry.

For the velic and laryngeal subsystems, featuraldescriptions can sometimes appear very similar to

gestural descriptions. Featural descriptions of thevelic and laryngeal subsystems usually containthe constriction degree of the particulararticulator as an inherent aspect; in these cases,they are very close to a gestural description (forexample, [+nasal] corresponds to a velie openinggesture). However, even for the velic andlaryngeal subsystems, there are situations inwhich a featural and a gestural analysis differ.For general discussions of distinctions in voicingand aspiration in the gestural framework, seeBrowman and Goldstein (1986) and Goldstein andBrowman (1986). (This latter paper is part of anexchange with Keating, e.g., 1984, 1990, about theviability of featural and gestural accounts ofvarious voicing phenomena). For a gesturalanalysis of the category of Hindi stop variouslycalled "voiced aspirated," "breathy voiced," or"murmured," see Schiefer (1989), who compared agestural account of these stops with a featuralaccount in which the category is treated as asequence of features (Dixit, 1987; also seeKeating, 1990). Schiefer demonstrated that thesequential differences in these stops fall outnaturally within the gestural framework, in whichthe breathy voice is realized with a single glottalgesture, timed comparatively late. Since gestureshave an extent in time, and describe movementsthat change in amount of openness at differentpoints during their realization, all the acousticchanges can be accounted for by this single glottalgesture (and its timing with respect to othergestures).

1.2 Gestural constellations: Combinations ofoverlapping gestures

As characterizations of physical events, gesturesoccur in space and over time. This has severalimplications. Since gestures have internalduration, they can overlap with each other; andsince gestures are physical events, they areaffected by physical processes occurring duringthe act of talking. In this section, we will focus onstructure-how gestural overlap is useddistinctively. Later sections will focus orprocess-how gestures vary in the act of talking.

The gestures that are employed in a givenutterance are organized, or coordinated, intolarger structures. We view the organizationformed by those particular gestures asconstituting the phonological structure of thatutterance (or at least part of this structure). Ofcourse, not every utterance in a language has anindividual organization-there are generalprinciples that define how classes of gestures are

organized, or phased. These principles capture thesyntagmatic aspect of a language's phonologicalstructure, while the inventory of gestures that canparticipate in these organizations captures itsparadigmatic aspect.

In the linguistic gestural component of thecomputational model currently being developed atHaskins Laboratories (see Figure 3), a firstapproximation of these phasing principles is usedto coordinate the gestures with one another(Browman & Goldstein, 1990b). This gesturalphasing results in a structure called a gesturalscore. A gestural score for the word "palm"(pronounced [phamD can be seen in Figure 4. Thisrepresentation displays the duration of theindividual gestures as well as the overlap amongthe gestures. The horizontal extent of a given boxindicates the discrete interval of time duringwhich its particular set of values for the dynamic

. parameters is active. Given overlap, this meansthat several different gestures-sets of values-

can be actively affecting the vocal tract at anyparticular instant in time. For example, in Figure4, at time 50 ms, both the labial closure gestureand glottal gestures are active; by approximatelytime 125 ms., the labial closure gesture is nolonger active but the tongue body narrowpharyngeal constriction has been activated for thevowel, so that at that point in time the glottalgesture and tongue body gesture are both active.Thus, with overlap the overall state of the vocaltract is dependent on more than one gesture.Articulatory phonology uses "tube geometry" tocharacterize the patterns arising from overlappingcombinations of gestures. As proposed byBrowman and Goldstein (1989) and furtherdeveloped by Bird (1990), tube geometryrepresents the constriction degree effects at eachlevel of the vocal tract (when viewed as a set oflinked tubes), and in this way forms the basis fornatural classes that have been defined usingfeatures such as [sonorant).

intended outpututterance speech

1...~

LINGUISTIC TASK

GESTURALARTICULATORY

DYNAMICSYNTHESIZER

MODEL MODEL

ARTICULATORYTRAJECTORIES

Figure 3. Gestural computational model.

28

VELICAPERTURE

TONGUE BODYCONSTRICTION

DEGREE

Browman and Golds/em

Input String: \1 paam\;

LIPAPERTURE

GLOTTALAPERTURE

100 200

TIME (MS)

300 400

Figure 4. Gestural score for the utterance "palm" (pronounced [pham]), with boxes and tract variable motions asgenerated by the computational model. The input is specified in ARPAbet, so IPA IpamJ =ARPAbet IpaamJ. The boxesindicate gestural activation, and the curves the generated tract variable movements. Within each panel, the height ofthe box indicates the targeted degree of opening (aperture) for the relevant constriction: the higher the box (or curve),the greater the amount of opening.

As currently implemented in the computationalmodel, the phasing statements coordinate pairs ofgestures by specifying a particular dynamically­defined point in each gesture that is to besynchronized. A very restricted set of points isused, for consonants generally the achievement ofthe target or the beginning of movement awayfrom the target, and occasionally the onset ofmovement towards the target. The importance ofthese or similar points has been noted by others.For example, Huffman (1990) suggested thatclosure onset and offset are among those"landmarks... [thatJ serve as the organizationalpivots for articulatory coordination" p. 78. Krakow(1989) observed regularities in the timing of themovements of the velum and lower lip with regardto these points (to be further discussed in Section2.2). Finally, both Kingston (1985, 1990) andStevens (in press) have emphasized theimportance of related points, but defined in theacoustic domain.

Notice (in Figure 4) that gestural scores providean inherently underspecified representation (e.g.,

Browman & Goldstein, 1989), in that not everytract variable is specified at every point in time.This is most akin to the restricted under­specification argued for by Clements (1987) andSteriade (1987), among others. Notice also thatgestural scores are exclusively tier-based.Hierarchical units such as syllables are currentlygenerally represented by the mechanism of asso­ciations (phasing) among individual gesturesrather than by hierarchical nodes. The only hier­archical unit for which we currently have evidenceis that of the oral gestures in a (syllable-initial)consonant cluster (Browman & Goldstein, 1988).In these clusters, the oral gestures overlap onlyminimally rather than maximally as typicallyhappens when gestures from different articulatorysubsystems co-occur (e.g., the oral and glottal ges­tures in Figure 4).

Much of the richness of phonological structure,in the gestural framework, lies in the patterns ofhow gestures are coordinated in time with respectto one another. We have used the term constella­tions to refer to such gestural coordinations with-

Artlcliialon; Phonology: An Overview

out pre-judging the correspondence between theconstellations and traditional units of phonologicalstructure (e.g., segments, syllables). Utterancescomprised of the same gestures may contrast withone another in how the gestures are organized,i.e., the same gestures can form differentconstellations. Contrasts between nasal andprenasalized stops or between post-aspirated andpre-aspirated stops are possible examples of thiskind. Considering only pair-wise combinations ofgestures with a similar extent in time, Browmanand Goldstein (1991) have proposed that possiblecontrasts in organization for these gestures arerestricted to three distinct types of temporal over­lap: minimal overlap, partial overlap, and com­plete overlap.

Gestural organization is constrained in morespecific, language-dependent ways as well. Forexample, Browman and Goldstein (1986) proposedtwo organizational principles governing glottalopening-and-closing gestures occurring in word­initial onsets (for at least a subset of Germaniclanguages, including English); (1) that glottalpeak opening is synchronized to the midpoint ofany fricative gestures, and otherwise to therelease of any closure gestures (followingYoshioka, LOfqvist, & Hirose, 1981) and (2) thereis at most a single glottal gesture word-initially.Given these generalizations, word-initial "sp" and"p" are both presumed to have a single glottalgesture, as shown in Figures 2f and 2g (ratherthan the two glottal gestures for "sp" expectedfrom a segmental analysis, see e.g., Saltzman &Munhall, 1989). The (allophonic) difference inaspiration between "sp" and "p" then followsautomatically from timing principle (1) combinedwith the fact that gestures are events withtemporal extent.. The fact that gestures are events with temporalextent can also eliminate the need for certainphonological adjacency constraints, which canoften be seen to follow directly from gesturaloverlap. For example, much work in featuregeometry (e.g., Clements, in press; McCarthy,1988; Sagey, 1986) constrains assimilation to bethe spreading of a feature to an adjacent slot,rather than the replacement of one feature byanother. From the point of view of gesturaloverlap, many cases of "assimilation" or apparent"coarticulatory" feature-spreading follow directlyfrom the fact that several gestures are co­occurring, either lexically or through laterconcatenation or sliding. (This will be discussedfurther in Sections 2 and 3; see also Bell-Berti &Harris, 1981, 1982; Boyce, 1990; Boyce, Krakow,

Bell-Berti, & Gelfer, 1990; Fowler, 1980; Gelfer,Bell-Berti, & Harris, 1989). As these authors havealso emphasized, there is no need to spread afeature, since gestures already have an inherentextent in time. A related constraint, that "totalplace assimilation in consonants will be restrictedto immediately adjacent consonants" (Clements, inpress, p. 29), also follows directly from gesturaloverlap. Zsiga (1993) discusses a number of casesin which overlap can account for variousphonological phenomena (as well as some problemareas for a gestural account). In general, theexistence of gestural overlap means that a numberof phonological constraints follow automaticallyrather than having to be stipulated.

The general style of coordination (or phasing)between gestures may also vary from language tolanguage. Smith (1988, 1991) has providedacoustic and articulatory evidence that temporalpatterns in Italian and Japanese are affecteddifferently by the change of an intervocalicconsonant from singleton to geminate, and Dunn(1990) has found similar evidence in a comparisonof Italian and Finnish. Smith found that, inItalian, no effect on the timing of the vowels wasobserved when consonants differed betweensingleton and geminate, but in Japanese, theintervowel organization was significantly altered.Such results are consistent with a gesturalorganization for Italian in which the vocalicgestures are directly phased with each other, andfor Japanese in which vocalic gestures are phasedonly indirectly, by being phased with respect tothe intervening consonantal gestures. In turn,such different coordination types are consistentwith the characterization of Japanese as mora­timed (e.g., Han, 1962; Port, Dalby, & O'Dell,1987) and Italian as syllable-(or possibly stress-)timed (e.g., Farnetani & Kori, 1986). The gesturalaccount of such "rhythmic" differences as beingdue to a difference in direct or indirectcoordination of vowels not only provides apotentially explanatory account of phonologicaldifferences, but predicts such phonetic detail aswhether the vowels are shortened as interveningconsonants are added (or lengthened).

2. CONTRAST AND ALLOPHONICVARIATION

We often refer to a gestural analysis as ananalysis of the "input," and more traditionalanalyses as analyses of the "output," where inputand output refer to descriptions of the (local)articulatory gestural organization and resultingglobal vocal tract shape/acoustics, respectively.

30 Browmall and Goldsiein

Traditional segmental analyses are descriptions ofthe combined effects of the (overlapping) gesturesin a gestural constellation, and therefore aretypically descriptions of the acoustics andtherefore the "output," in our terminology. Evenfeatural descriptions often refer to attributes ofsegments, and are therefore often "output"descriptions. This is the source of the differencesin description between the gestural approach, onthe one hand, and segments and/or features on theother hand. An example of the descriptivedifferences has already been alluded to, re thevoicing and aspiration issue (Browman &Goldstein, 1986; Goldstein & Browman, 1986;Keating, 1984, 1990; Schiefer, 1989). In thissection, we will present a number of examples ofgestural analyses of cases that have traditionallybeen analyzed in segmental and/or featural termsas different kinds of allophonic variation, showingthat the gestural analyses capture a wider rangeof behavior, and do so by using general principlesrather than special category-changing rules. Atthe same time, the underlying "input" structurescapture contrast in a simple fashion.

Traditionally, the complement to contrast hasbeen seen as identity. That is, two primitivephonological units either contrast or they areconsidered to be identical. Where this identity isat odds with phoneticians'/phonologists' percept ofspeech, this led historically to positing a singleunderlying phonemic (or phonological) unit, withdistinct allophonic units in a more narrowphonetic representation (cf. discussion inAnderson, 1974). The same phoneme is "spelled"as categorically distinct allophones in differentenvironments. However, when articulatorygestures are used as phonological primitives,much of the variation that was traditionallycaptured by a distribution of distinct allophonic .units can, instead, be captured either byquantitative variation in the "input" parametricspecification of a given gesture, or as a direct"output" consequence of overlap of invariantgestural units.

Generalizations. There are cases in which agestural analysis reveals generalizations thathave been missed in traditional allophonicdescriptions. For example there are cases in whichtwo very different allophonic rules (when couchedin terms of segments and features) must beposited to describe what is quantitative variationin one and the same gesture in the same contexts.Further, there are cases in which particularprosodic contexts (e.g., stress and syllablepositions) show a very similar influence on

gestures of different types (oral and laryngeal, forexample), or on their organization. We will discusssuch cases below.

Relation between allophonic and other variation.There is much systematic, quantitative variationof speech gestures that has never been captured ina narrow allophonic transcription of the conven­tional sort, and could not be easily described inthis way (e.g., differences in the magnitude andduration of stop consonant gestures in differentprosodic environments-Browman & Goldstein,1985; Kelso, V.-Bateson, Saltzman, & Kay, 1985;Munhall, Ostry, & Parush, 1985). As wili be ar­gued below, there is no principled difference be­tween this kind of variation and the kind that hasbeen annotated in a narrow transcription. In fact,we will examine cases in which the same parame­ter of variation has been treated as allophonic insome contexts and (implicitly) as quantitative inothers. Moreover, as others have argued (e.g.,Pierrehumbert, 1990; Sproat & Fujimura, 1989),this intermediate allophonic representation doesnot contribute in a useful way to the complete de­scription of the variability. It is either unneces­sary, or gets in the way of stating the generaliza­tions. Thus, it seems that many allophonic differ­ences are just quantitative differences that arelarge enough that phoneticians/phonologists havebeen able to notice them, and to relate them todistinctive differences found in other languages.

In this section, then, we will see that the verysame syntagmatic organization will give rise tosuperficially different kinds of "output" variationsuch as "coarticulation" and allophonic differ­ences, depending on the nature of the particulargestures in the organization (2.1,2.2). In addition,we will see that general patterns of quantitativevariation in gestural parameters can also give riseto a variety of superficially unrelated "output"consequences (2.3).

2.1/1Coarticulation" of consonants and vowels

In the phasing rules that are currentlyimplemented in our model, oral constrictiongestures are divided into two functional classes:vocalic and consonantal (Browman & GOldstein,1990b). The distinction reflects the intrinsicdifferences between the two classes of gestures intheir dynamical parameters. The consonantalgestures typically have a greater degree ofconstriction and a shorter time constant (higherstiffness) than the vocalic gestures. Syllable-sizedorganizations are defined by phasing (oral)consonant and vowel gestures with respect to oneanother. The basic relationship is that initial

Articulatorl/ Phonolo's"1l: .in Oven'iew

consonants are coordinated with vowel gestureonset, and final consonants with vowel gestureoffset (the specific points being coordinated alsodiffer in the two cases). This results inorganizations in which there is substantialtemporal overlap between movements associatedwith vowel and consonant gestures, as was seen inthe gestural score of Figure 4.

When the same (invariant) consonant gesture iscoproduced with different overlapping vowelgestures (e.g., in [ada] vs. [idi]), the articulatormotions produced by the task dynamic model willdiffer, reflecting the vowel gestures' demands onthe articulators that they share in common withthe consonant. As discussed in Saltzman andMunhall (1989), the nature of this variationproduced by the model will differ depending onwhether the overlapping gestures are defined withrespect to the same or distinct tract variables. Inthe case of distinct tract variables (e.g., TT for [d]and TB for vowels), the consonant gesture willachieve its invariantly specified tract variable(TT) target regardless of what vowel isoverlapping, although the particular contributionof articulators used to achieve this target (jaw,tongue body, tongue tip) will differ depending onthe vowel. Thus, the overall shape of the vocaltract produced during the tongue tip closure willdiffer in [ada] and [idi]. As shown in Saltzman andMunhall (1989), this difference corresponds tothat seen in Ohman's (1967) X-rays. The differentarticulatory trajectories will produce differentacoustic formant frequency transitions for the twostops, but apparently no difference in theconsonant's percept (Fowler, 1980; Liberman,Cooper, Shankweiler, & Studdert-KennedY, 1967;Liberman & Mattingly, 1985).

In the case where consonants and vowels sharethe same (TB) tract variables (e.g., the consonant[g] as in [aga] or [igiD, the consonant and vowelgestures cannot both simultaneously achieve theirtargets, since they are attempting to move exactlythe same structures to different positions. As aresult the location (but not degree) of constrictionachieved for the consonant will vary as a functionof the overlapping vowel (Saltzman & Munhall,1989). Again, this is consistent with the X-raydata of Ohman (1967). In this case, however, thedifference is perceptible (at least to phoneticians),and has sometimes been represented by distinct"front" and "back" allophones.

These examples of consonant/vowel overlap il­lustrate two important points about gesturalstructures. First, they show how, as invariantlyspecified phonological units, gestures can give rise

to context-dependent articulatory and acoustictrajectories, without having to posit any"implementation rules" for converting specific in­variant (phonological) units into variable(physical) parameters. The variation follows di­rectly from the definition of the units as parame­terized task-dynamical systems, their phonologicalorganization (pattern of overlap), and the generalprinciples of how overlapping units blend. Thesame gestural structures simultaneouslycharacterize phonological properties of the utter­ance (contrastive units and syntagmatic organiza­tion) and physical properties. Second, this exam­ple suggests how the very same syntagmaticstructure (pattern of overlap) can yield differentkinds of variation (allophonic vs. just"articulatory-acoustic"), as a function of the par­ticular gestures involved-in particular, whetherthose gestures use the same or differentarticulator sets.

2.2 High-level units in velie and oralsubsystems

Recently, the differing intergestural organi­zation found in different (syllable) positions hasbeen investigated in detail for two differentgestural constellations in English: nasalconsonants (Krakow, 1989) and 11/ (Sproat &Fujimura, submitted). Both are constellationscomprising two gestures: a nasal consonantincludes oral constriction and velic loweringgestures; III includes tongue tip constriction andtongue body retraction gestures. Comparison ofthe data from these two papers reveals importantsimilarities in how gestural organization varies asa function of position, despite differences in thetraditional descriptions. For nasals, thetraditional account characterizes syllable positiondifferences by spreading the relevant feature([nasal]) to the preceding vowel in the syllable­final case (e.g., Ladefoged, 1982), while for Ill, theposition differences in certain dialects of Englishare handled by positing different allophones("clear" vs. "dark," differing in the feature [back],e.g., Keating, 1985;) in initial and final position.However, as we saw with consonant-voweloverlap, this turns out to be an example in whichthe syntagmatic organization of the gestures isthe same in these two cases, an aspect missed bythe allophonic and featural descriptions.

Krakow's (1989) results show a clear differencein coordination between word-initial nasals (e.g.,"see m ore") and word-final nasals (e.g., "seemore"). In the word-initial case, the end of thevelum lowering movement is roughly synchronous

Browman and Goldstein

with the end of the lip closing movement. Thegestures appear to be phased so that the effectiveachievement of their targets coincide. For theword-final case, however, the end of velumlowering occurs substantially earlier (100-350 ms)than the end of lip movement. In fact, the end ofvelum lowering appears to coincide with thebeginning of the lip closing movement in this case.Syllable-position effects are similar to these word­position effects.

Sproat and Fujimura (submitted) found that thetongue body retraction (TB) and tongue tip raising(TT) movements for English IV also differ in theircoordination as a function of word position. Inword-initial position (e.g., "B. Likkovsky"), theextremum of the TB movement follows the TTextremum slightly, while in the word-finalposition (e.g., "Beel, equate") the TB extremumoccurs substantially earlier than that for TT.Sproat and Fujimura manipulated the strength ofthe prosodic boundary following non-initial IV,from none (e.g., "Beelik") to an intonation break(e.g., "Beel, equate"), and concluded that there iscontinuous variation in the relative timing of thetwo movements as a function of the boundarystrength. However, examination of the relativetiming data for prevocalic III shows that, ingeneral, truly word-finallVs show the pattern withTB leading (with the magnitude of the leadaffected by the strength of the following syntacticboundary), while the non-word-final cases (initial,medial, and medial before morphologicalboundaries) show either simultaneity or a slightlagging ofTB.

There is an apparent similarity, then, inbehavior of the gestures forming the constellationsfor nasals and Ill. Both constellations exhibitchanges in relative timing as a function of word(or possibly syllable) position. In both cases, non­final position shows the gestures more nearlysynchronous than in final position, and in bothcases, it is the gesture with the narrower oralconstriction (lip closure for the nasals, TT raisingfor IV) that lags substantially in final position. Inthe case of the nasals, there is evidence for aspecific shift in phasing: the end of velum loweringis coordinated with the end of lip closing for

'ials, but the beginning of the lip closingement for finals.would strengthen the parallelism if evidence

for such a shift also existed for IV. Sproat andFujimura did not examine this directly, althoughthere is some indirect evidence in their data forsuch a shift. In final position, the TB gestureoffset (as measured by movement extrema)

precedes the TT gesture offset substantially. If theTB gesture offset were, in fact, being coordinatedwith TT gesture onset, as the analogy with thenasal behavior would predict, then as the TTmovement increases in duration (e.g., beforedifferent boundaries), the measured offset-to­offset lag between the gestures should increaseproportionally. Sproat and Fujimura measuredthe acoustic duration of the pre-boundary rime(which presumably is related to the acousticduration of the IV, and hence to the movementduration of TT); a clear correlation between thisduration and the offset-to-offset lag for final IV canbe observed in their Figure 8. This parallels acorrelation between lip closure duration andoffset-to-offset lag found by Krakow for the finalnasals. Moreover, the points in Sproat andFujimura's figure corresponding to non-final IVshow TT leading, and do not appear to show anycorrelation between the magnitude of the TT leadand IV duration. This lack of correlation withduration would be expected if the offsets werebeing coordinated in this case, and such a lack ofcorrelation is also found for non-final nasals.

The parallelism of nasals and III revealsorganizational patterns that are similar acrosssubsystems and correlated with position in theword (or syllable). Viewing these behaviorsgesturally suggests a (speculative) possible widergeneralization, namely that there is a singlesyllable-final organizational pattern in which thewider constrictions always precede narrowerconstrictions (reminiscent of the sonorityhierarchy; cf. also the related hypotheses of Sproat& Fujimura, submitted, and Mattingly, 1981). Thesame pattern would then be· invoked for the(vocalic) tongue body and (consonantal) tongue tipgestures in "add," the two Ill-related tonguegestures in syllable-final IV, and the velie and lip(or tongue) gestures in syllable-final nasals.Parallelism between the velie and oral subsystemshas been noted elsewhere as well. For example,Browman (in press) showed how, if syllable-finalvowel nasalization were treated as a long veliegesture, then similarities in behavior betweensyllable-final nasals and long oral gestures, i.e.geminates, on a gating task (Lahiri & Marslen­Wilson, 1992) could be explained.

The similarities across subsystems revealed inthese studies are generalizations only in agestural approach, and not in the more traditionalanalyses of these variations as being different inkind (in the nasal and IV example, as feature­spreading and different feature values,respectively). While the articulatory and acoustic

Artlculatory PhonatoRY: An Overuzew

consequences differ depending on the particulargestures involved, in a gestural approach theseconsequences do not need to be explicitlycontrolled, as they are automatic consequences ofthe syntagmatic organization and the particulargestures involved.

2.3 Glottal gestures: Positional (and other)variants

We have seen in previous subsections how whatis traditionally described as contextual orallophonic variation can result automatically fromthe fact of overlap between invariant gesturalunits (e.g., overlap between consonants andvowels), or from differences in the characteristicpatterns of overlap of gestures in syllable-initialand -final positions. In addition, some kinds ofallophonic variation can be shown to result fromquantitative variation in a gesture's dynamicparameters as a function of prosodic variablessuch as stress and position. Gestures shrink inspace and in time in some contexts. This latterkind of variation is quite constrained-it scalesthe metric properties of a gestural event, but doesnot alter the composition of articulatorycomponents out of which it is assembled.

Aspiration in English. A relevant example in­volves voiceless stops in English. Traditionally,these have been described as having aspirated andunaspirated allophones in different environments.Kahn (1976), for example, defines the environ­ment that selects the aspirated allophone as"exclusively syllable-initial," with the unaspiratedallophone occurring elsewhere. Kahn's rule as­signs the feature [+spread glottis] in these aspi­rated environments, with [-spread glottis] gener­ally being used for unaspirated allophones. Thisdistinction is not an accurate characterization ofthe aspiration differences in English; nor is it ei­ther accurate or desirable to use a categorical ruleto describe the aspiration of stops in English.

In many of the environments in which theoutput appears to be unaspirated, there is in facta glottal opening-and-closing gesture present inthe input. That is, presence or absence ofaspiration in the output is generally not a discretefunction of whether or not the glottis is spread,but rather is either a function of the timing of theglottal gesture with an associated oral gesture or a(gradient) function of the magnitude of the glottalgesture. The first cause of lack of aspiration in theoutput occurs in initial [s]-stop clusters, asmentioned in Section 1, in which lack of aspirationautomatically results from the pattern of overlapamong the contrastive gestures. As noted

previously, English has a constraint that at mostone glottal opening (spreading) gesture can occurin word-initial position. When this single gestureis associated with a fricative gesture, whether as asingleton or as a member of a sequence of oralgestures, the peak glottal opening is phased to themiddle of the fricative gesture (probably its peakdisplacement). In the case of an [s]-stop cluster,this means that the glottis is already narrowed bythe time the stop is released, which results in a"short lag" in the onset of voicing following release(VOT). This is the basis for the description of stopsin such clusters as voiceless unaspirated (Lisker &Abramson, 1964).

The second cause of lack of aspiration in theoutput is the gradient reduction of glottalmagnitude due to differences in stress andposition. In analyses such as Kahn's, stress andposition allophones are represented categorically.Voiceless stops are unaspirated in word-medialposition before unstressed vowels (e.g., "rapid")because they are "ambisyllabic" rather thanexclusively syllable-initial and therefore arerepresented as [-spread glottis]. However,voiceless stops are aspirated ([+spread glottis)) inthe same position when before stressed vowelsbecause they are considered to be syllable-initial.Single stops in word-initial position before eitherstressed or unstressed vowels are also aspiratedand represented as [+spread glottis]. Thiscategorical approach to aspiration is not supportedby a recent study by Cooper (1991), who usedtransillumination to measure glottal aperture infour environments: initial vs. medial, beforestressed and unstressed vowels.

Examining these four environments in two-syl­lable reiterant speech utterances (/pipip/, /titit/,and lkikikl), Cooper found, first of all, that therewas a glottal spreading gesture in all four envi­ronments, contrary to the prediction that theunaspirated environment is [-spread]. Secondly,he found effects of both stress and word positionon the magnitude of the glottal spreading gesture(in both space and time), with initial position andstress favoring larger gestures. Thus, the medialunstressed position showed the smallest glottalspreading gesture overall. From a gestural pointof view, there is nothing special or categoricallydifferent about the medial unstressed case-it issimply the environment that shows the most ges­tural reduction because of the combined effect ofstress and position. In an analysis such as Kahn'sin which the medial unstressed case is viewed asan allophone categorically distinct from the formoccurring in the other three environments, one

would expect to observe qualitatively distinct la­ryngeal behavior in the medial unstressed case.This expectation is not borne out by Cooper's data.A weaker prediction of the categorical view is thatthere should be a robust interaction betweenstress and position factors, such that stress has alarge effect medially, but little or no effect ini­tially. This weaker prediction is also not borneout-the utterances with ItI and IkI generally showno interaction at all (although an interaction isobserved for Ip/). Cooper's own conclusion, basedon additional experiments not summarized here,is that stress and word position, rather than syl­lable structure and aspiration category, are therelevant variables that regulate laryngeal behav­ior of voiceless consonants in English.

Voicelessness in final position differs from thatin other positions. In final position (word or possi­bly syllable), the glottal spreading gesture inEnglish is usually not observed at all (e.g., Lisker& Baer, 1984). However, the muscular activitynormally associated with spreading gestures(increased activity of the posterior crico-aryn­tenoid muscles, suppression of the interarytenoid)is found for such final stops in Lisker and Baer'sdata (and also in Hirose & Gay, 1972), althoughreduced in magnitude. This is consistent with agestural reduction analysis: final position repre­sents the most extreme case of reduction.However, analysis of final position is complicatedby the fact that a constriction of the false(ventricular) folds is sometimes observed in thisposition (Fujimura & Sawashima, 1971; Manuel &Vatikiotis-Bateson, 1988). It is presumably thisconstriction that led Kahn to posit yet a thirdallophone for voiceless stops ([+cohstricted glot­tis]) in final position. Since the relation betweenthis constriction and the muscular control of theglottis (proper) has not been explicitly investi­gated, it is not clear how to relate this constrictionto the glottal spreading gesture.

Aspiration and "k." As reported above, posi­tional and stress allophones of English voicelessstops result from quantitative variation in gesturemagnitude (with the possible exception of the finalventricular constriction). Since the unit of reduc­tion is the gesture, the gestural analysis predictsthat similar patterns of reduction should be found,regardless of whether they have been analyzed asa segment ("h") or a feature ([+spread]).Pierrehumbert and Talkin (in press) have recentlymeasured amount of reduction in glottal abductionfor "h" in various prosodic contexts, using acousticanalysis to estimate the actual abduction. Whilemost of their focus was on more global prosodic

structure (phasal accent and intonation bound­aries), they also found reduction effects due toword stress and position generally similar to thosefound by Cooper (1991) (although as noted above,Cooper's data shows some degree of influence ofthe supralaryngeal gesture on the laryngeal ges­ture). In a non-gestural approach, the similarity inbehavior of "h" and [+spread] is not captured,since unlike aspiration in stops, the variation in"h" is not usually represented at all, even by dis­tinct allophonic units (except where the reductionis so extreme that it is sometimes analyzed asdeleted, for example in "vehicle"). In a gesturalapproach, however, the same reduction processgives rise to both kinds of variation.

There is also a symmetry in final positionbetween voiceless stops and "h" in English. Infinal position, glottal spreading gestures arereduced to the limiting case of no observableopening. This is exactly the environment in which"h" does not occur in English. In a gesturalframework, this distributional fact follows fromthe facts of reduction noted in voiceless stops.That is, words cannot have a contrastive glottalspreading gesture in final position, because suchgestures are reduced to zero in final position,regardless of whether the glottal spreadinggesture co-occurs with an oral constriction or not.(Contrast between final voiced and voiceless stopsis possible only because this contrast involvesother differences such as vowel length-Lisker,1974-which can themselves be analyzed asoverlap differences between consonant and vowelgestures, Fujimura, 1981). In more traditionalapproaches, this relationship between thedistribution of "h" and the allophones of voicelessstops is not captured.

Generalizations across glottal and oral gestures.If the variation in the glottal gesture due toposition and stress is in fact due to a generalprocess, then such variation should be observed inother gestures occurring in similar environments.Similarities in the behavior of glottal and oralmovements due to position and stress differenceshave indeed been observed.

The behavior of tongue tip movements is knownto be affected by stress and position. For example,flapping of alveolar closures in English tends tooccur in medial unstressed environments (Kahn,1976), where we have seen that there is also sub­stantial reduction in glottal spreading. If we as­sume that a flap is a reduced tongue tip closuregesture, reduced in time and possibly also in dis­placement, then the tongue tip and glottalgestures are behaving similarly. Apparent

counter-examples are the medial unstressedalveolar stops that have not been considered to beflaps (e.g., in "after"). Since glottal gesturereduction applies in "after"-the "t" isn'taspirated-one would expect a reduced alveolargesture here as well. However, these cases can behandled very nicely when input and outputdescriptions are properly distinguished. Althoughthe alveolar in "after" is not considered to be aflap, it is possible that the alveolar closure isreduced in this context (input), but that thepercept of a flap (output) depends on having anopen vocal tract both before and after the reducedtongue tip movement. This analysis is related tothat of Banner-Inouye (1989), who analyzesflapping in English autosegmentally as resultingfrom spreading of "open aperture" ([-cons)), fromeither side onto the timing slot associated with acoronal consonant. The phenomenon of flapping isthus analyzed by her as a short (single timingslot) open-dosed-open contour that results fromspreading in English. In the gestural framework,the reduction (making the movement short) wouldoccur regardless of what other gestures areinvolved, but the description (or percept) of the re­sulting structure as a flap would depend on anopen-closed-open acoustic contour (i.e., the struc­ture in "butter" but not "after.") That is, thereduction process would always reduce the oralgesture in this environment, but the contour thatis perceived as a flap would simply be one of thepossible output consequences, depending on theappropriate set of gestures.

There are also potential parallels betweenglottal spreading and tongue tip closure gesturesin final position. As we shall see in the nextsection, final alveolar closure gestures are subjectto a variable amount of reduction in final position,including the failure to achieve any tongue tipcontact. This is, of course, reminiscent of thefrequent failure to see any actual glottal openingfinally. When such reduced final alveolars coincidewith the ventricular constriction discussed above,this produces the structure that has traditionallybeen described as the glottal stop [?] allophone of/t!. The confluence of these events can be seen inthe fibroscopic and palatographic data of Manueland Vatikiotis-Bateson (1988).

Other oral constriction gestures also exhibitpatterns of reduction similar to those exhibited bythe glottal spreading and alveolar closuregestures. For example, bilabial closure gesturesshow effects of stress (e.g., Beckman, Edwards, &Fletcher, 1992; Kelso et al., 1985) andstress/position (Browman & Goldstein, 1985;

Smith, Browman, McGowan, & Kay, submitted),similar to those shown by glottal gestures. Thesepapers show substantial reduction of labialgestures in non-initial reduced syllables (initialreduced syllables were not examined). Thus, thereduction processes associated with stress andposition in English for glottal gestures appear tobe general, operating on tongue tip and labialgestures occurring in the same environments.Note again that while the variation in thedynamics of the tongue tip gesture has beenrepresented as allophonic, the variation in the lipgesture has not been. Yet both seem to beinstances of a very general reduction process, onethat also operates on glottal gestures.

In addition to looking at similarities in the envi­ronments in which different kind of reduction oc­cur, it is possible to focus on the form of the re­duction itself, as observed in the dYil<;lmic proper­ties of the gestures. Munhall et al. (1985) havedemonstrated similarities in the velocity profilesof movements of the glottis and the tongue dorsum(in /k/). In addition, the quantitative changes inthe kinematic properties (i.e., displacement,duration, peak velocity) for different stressconditions were shown to be similar for the tonguedorsum and glottal movements.

In summary, allophonic variation associatedwith prosodic variables such as position and stress

. has been shown, in many cases, to be a con­strained quantitative and gradient variation,rather than a categorical variation. Viewing suchas gradient changes within a gestural frameworkcaptures similarities in behavior across positionand stress and across different featural and seg­mental characterizations of glottal spreading ges­tures, and also captures similarities in behavioracross different articulatory subsystems.

3. VARIATION DURING THE ACT OFTALKING

In this section, we examine some of the conse­quences of using the gestural approach to analyzephonological and phonetic variation that can beattributed to processes occurring during the phys­ical act of talking. This variation arises from twointerlocking sources, one gradient and one cate­gorical. Beginning with a contrastive canonicalgestural structure, processes occurring during theact of talking will cause gradient changes that canultimately be perceived as a categorically differentgestural structure. This is due, among otherthings, to the fact that the acoustic (as well as ar­ticulatory) consequences of a given invariantlyspecified gesture will differ depending on what

36 Broumzan and Goldstein

other gestures are concurrently active (Browman& Goldstein, 1990a, 1990b). The following exam­ples will show how the constrained processesavailable in the gestural view provide a unifiedand explanatory view of a variety of superficiallydifferent kinds of phonetic and phonological alter­nations.

3.1 Speech production errors:Connected speech

One aspect of the act of talking that appears tobe well handled by a gestural account is that ofspeech production errors. Mowrey and MacKay(1990) recorded muscle activity for [l] duringexperimentally induced speech errors in tonguetwisters such as "Bob flew by Bligh Bay." In onesession, about a third of the 150 tokens showedanomalous muscle activity, such as insertion of [1]activity in "Bob" or "Bay" and diminution of [1]activity in "flew" or "Bligh." Only five of thesetokens, however, involved all-or-none behavior;most of the activity was gradient. That is, themagnitude of activity in both the inserted and"original" [1] fell on a continuum. Some of theerrors were small enough so that they were notaudible. The timing of the inserted activity was,however, localized and consistent. Such errors, inwhich the positioning (organization) is categoricalbut the magnitude is gradient, can be handledvery naturally in a gestural framework.

Another aspect of the act of talking that is wellhandled in the gestural framework involvesalternations that occur in connected speech. Asshown in some of the data summarized below, inconnected speech the patterns of gestural overlapmay vary. In particular, factors associated withincreased fluency (e.g., increased rate, moreinformal style) result in increasing the temporaloverlap among gestures. Additionally, prosodicboundaries may influence the degree of overlapbetween neighboring gestures that belong tosuccessive words. We have hypothesized that thiskind of variation can result in changes that havetraditionally been described as "fast speech"alternations of various sorts, and have presentedarticulatory evidence for this (Browman &Goldstein, 1990a, 1990b). However, it is importantto note that such gestural sliding is endemic intalking (e.g., Hardcastle, 1985), and not limited tothe cases that have been noted as alternations.Thus, this is another situation (like thosediscussed in Section 2) in which some, but notothers, of the results of a single gradient processhave been noted in phonetic transcriptions. In a

gestural account, a single generalization (increasein overlap) characterizes all these cases.

Evidence for increased overlap as rate increaseshas been presented for consonant and vowelgestures (Engstrand, 1988; Gay, 1981) and for thelaryngeal gestures for two voiceless consonants incontiguous words (Munhall & LOfquist, 1992).Hardcastle (1985) has presented evidence forvariation in gestural overlap as a function ofprosodic boundary strength as well as rate. Usingelectropalatography, he measured overlap in timebetween the dorsal closure for /k/ and the onset ofthe tip/blade contact for a following flJ. The Ikl/sequences employed included word-initial clustersand examples in which the IkI and III wereseparated by various boundaries (syllable, word,clause, and sentence). Sentences were read at fastand slow rates. In general, the amount of overlapwas consistently greater at the fast rate than atthe slow rate. The effect was observed in allphonological and syntactic contexts, but waslargest at the clause and sentence boundaries.Here, slow rates often showed long "separation"intervals between the gestures (rather thanoverlap), while fast rates tended to showconsiderable overlap, often greater than that seenin the within-word or within-phrase cases. Thus,both rate and prosodic boundaries influencegestural overlap.

In this example, variation in gestural overlapdid not produce changes that have been describedas connected speech alternations. However, wehave proposed (Browman & Goldstein, 1990b) thatthere are circumstances in which increasedoverlap would result in such alternations. Onesuch circumstance we refer to as gestural "hiding."This occurs when gestures employing distincttract variables (cf. Section 2.1) increase theiroverlap to such an extent that even though all therelevant constrictions are formed, one of themmay be acoustically (and perceptually) hidden byanother overlapping gesture (or gestures). X-rayevidence for this hiding analysis was provided inBrowman and Goldstein (l990b). For example,two productions of the sequence "perfect memory"were analyzed, one produced as part of a word list(and thus with an intonation boundary betweenthe two words), the other produced as part of afluent phrase. In the fluent phrase version, thefinal [t] of "perfect" was not audible, and it wouldbe conventionally analyzed as an example ofalveolar stop deletion in clusters (e.g., Guy, 1980).

However, the articulator movements suggestedthat the alveolar closure gesture (for the [t]) still

ArllcuiaioT1/ Phonoio5Y: An Overulew

occurred in the fluent version, with much thesame magnitude as in the word list version thathad a clearly audible final [t]. The difference wasthat in the fluent version, the alveolar closure wascompletely overlapped by other stop gestures-theclosure portion by the preceding velar closure([k]), the release portion by the following labialclosure (for the Em]). Thus, from the point of viewof an articulatory phonology, all the phonetic units(gestures) were present in both versions. Thedifference between the list and fluent forms wasdue to variation in the gradient details of overlap,a process for which there is independent evidence.In other contexts, for example when a veliclowering gesture co-occurred with the hiddengesture, hiding produced apparent assimilations,rather than deletions. Thus, in the phrase "sevenplus" produced at a fast rate, the final consonantof "seven" was audibly Em], but evidence for analveolar closure was still present. Only a singlegesture was hidden (the oral alveolar closuregesture) and not a segment-sized constellation ofgestures. It is precisely this fact that leads to thepercept of assimilation rather than deletion in thiskind of example.

In analyzing casual speech alternations asresulting from gestural overlap, we were led tomake the strong hypothesis (Browman &Goldstein, 1990b) that all examples of fluentspeech alternations are due to two gradientmodifications to gestural structure during the actoftalking-(a) increase in overlap and (b) decreasein gesture magnitude. (The latter modification isrelated to the gestural modifications as a functionof prosodic structure discussed in Section 2). Atypical example of magnitude reduction might bethe pronunciation ofthe medial (velar) consonantin "cookie" as a fricative rather than as a stop(Brown, 1977). Under this hypothesis, casualspeech variation is quite constrained: all thelexical phonological units are present, though theymay be decreased in magnitude and overlapped byother gestures. Gestures are never changed intoother gestures, nor are gestures added.

3.2 Assimilation of final alveolarsA related hypothesis has been proposed by

Nolan (in press); based on analyses of apparentassimilations of single final alveolar stops to fol­lowing labial and velar stops (e.g., 1tJ.-->[k] in".. .late calls ... "). Using electropalatographic con­tact patterns, he found that the final alveolarswere present, but reduced in degree to a variableextent, in the forms that were perceived asassimilated (see also Barry, 1985; Kerswill, 1985

for examples of such "residual" tongue tip ges­tures). Moreover, even in cases in which no alveo­lar electropalatographic contact was observed, theassimilated forms were perceptually distinguish­able from forms with no lexical alveolar stop ges­ture at all (e.g., assimilated "bed" vs. "beg"). Thesefindings led Nolan to propose that "differences inlexical phonological form will always result indistinct articulatory gestures." From the point ofview of articulatory phonology, this constraint fol­lows quite naturally-the phonological form is anorganization of gestural events.

Nolan's experiments on the class of final alveo­lar assimilations focussed on the role played bythe reduction ofthe tongue tip gesture. In additionto reduction, however, the overlap between thatgesture (reduced or not) and the following stopgesture may playa role in perceived assimilations.The role of overlap in the acoustics and perceptionof similar assimilations was investigated by Byrd(1990). Using the computational gestural modeldiscussed in section 1.2, Byrd generated utter­ances with a continuum of overlap for each of thephrases "bad ban" and "bab dan" by systematicallyvarying the overlap between the alveolar and bil­abial closure gestures. She found an asymmetrybetween the perceptions of the gestures in word­final position. When the first word ended in Ed],the word-final alveolar was perceived as being as­similated to the following (b] when overlap in­creased substantially. However, with the sameamount of overlap, the word-final (b] was not as­similated, and in fact, the following word-initial(d] in such cases tended to be perceived as beingassimilated to the (b]. (An asymmetry in the samedirection, although less extreme, was found whensubjects listened to the first word extracted). Byrdrelated this perceptual asymmetry in favor of thelabial closure to the VC and CV formant transi­tions produced by synchronous (overlapping)labial and alveolar closure gestures. In general,such formant transitions were more similar tothose produced by labial stops alone than thoseproduced by alveolar stops alone. Thus, the effectof overlap tended to obscure final alveolars but notfinallabials. This could contribute to the tendencyin English for final alveolar stops (but not finallabials or velars) to assimilate to following stops(Gimson, 1962).

The simulation results of Byrd suggest thatformant frequency transitions into final alveolarstops should vary as a function of the followingstop (as long as they are at least partiallyoverlapping). This hypothesized acoustic "contexteffect" was confirmed in an investigation of

natural speech by Zsiga and Byrd (1990). Theyexamined formant frequency transitions into themedial closure in phrases like "bad pick," "badtick," and "bad kick" produced at different rates.The major finding was that formant transitionsshifted away from those expected for an alveolarstop towards those expected for the followingconsonant-either a labial stop, as in "bad pick,"or a velar stop, as in "bad kick." In the case of thefollowing labial, the effects on formant transitionsagreed with those observed in Byrd's simulationsof "bed ban" in which the labial closure gestureoverlapped the alveolar gesture-both F2 and F3were lower at the offset of the first vowel for "badpick" than for "bad tick." The magnitude of theseeffects was generally smaller than that found inByrd's complete synchrony condition, which isconsistent with the fact that final alveolarconsonants in this natural speech experimentwere actually perceived as such and were notassimilated to the following labials or velars. Ingeneral, perceptual assimilation should occur onlywhen the effects of gradient overlap and reductionexceed some perceptual threshold.

A second finding of Zsiga arid Byrd's was that,for utterances where the second word in thephrase began with a velar stop (e.g., "bad kick"), asystematic relation was observed betweentemporal and spectral properties as rate wasvaried. When rate variation resulted in a decreasein the total duration of the medial closure, therewas also an increase in the velar effects seen inthe formant transitions. This relation can besimply accounted for by assuming that these casesinvolve increased overlap between the tongue tipand tongue body gestures.

Finally, an ongoing experiment by A. Suprenantis explicitly testing the relative contributions ofoverlap and gestural magnitude to the percept offinal stops. The experiment employs tokens ofutterances like "MY pot puddles" collected at theX-ray microbeam facility at the University ofWisconsin. These tokens show variation in boththe magnitude of tongue tip raising for the final [t]in "pot" and in the temporal overlap of thatgesture and the lip closure gesture ofthe followingword. Listeners are presented with thesesentences in a speeded "detection" task.Preliminary results suggest that detection of "t" isa function both of its magnitude and amount ofoverlap with the following consonant.

3.3 Reduced syllable deletion

Assimilations (and deletions) of stop consonantsrepresent only one kind of fluent speech

alternation. Another example that follows directlyfrom changes in gestural overlap is deletion ofschwa in reduced syllables. For example, in aword like "beret," the vowel in the first syllable,either [()fJ or [1 J may be apparently deleted incontinuous speech, producing somethingtranscribed as [bleI]. The tendency for deletion hasbeen shown to be a "graded" one, dependent on anumber of contextual factors (e.g., Dalby, 1984).We have demonstrated (Browman & Goldstein,1990a) that the concomitant shift in syllabicitycould be the perceptual consequence of an increasein overlap between the initial labial closuregesture and the tongue gestures for the "r." Thiswas shown by using the computational gesturalmodel to generate a continuum in which thedegree of overlap or separation between thecontrol regimes for the labial closure and the "r"varied in small steps. In the canonicalorganization for "beret," the labial and "r"gestures did not overlap at all. This meant thatthe labial gesture was released before the "r" wasformed. This differed from the canonicalorganization for "bray," in which the gestureswere partially overlapping (like the velar and "1"gesture in the clusters illustrated in Hardcastle,1985). When listening to items from thecontinuum in a forced choice test, subjectsresponded with "bray" to items in which labial and"r" gestures overlapped, and "beret" to items inwhich they did not overlap.

Thus it is possible to view reduced syllabledeletion as resulting from an increase in gesturaloverlap in fluent speech. This treatment isattractive for two reasons. First, it treats deletionas resulting from the same general process thatgives rise to other (superficially unrelated)alternations. Second, it leaves us with the claimthat all phonetic units constituting a lexical itemare still present in fluent speech; only the overlaphas changed, in a predictably gradient way. Thisseems to be a more natural treatment than onewhich would assume that an important structuralunit (a syllable) is suddenly and completelyeliminated in fluent, connected speech.

Another important aspect of this treatment ofreduced syllables is the fact that the lexicaldifference between "bray" and "beret" wasmodeled only in terms of the coordination of labialclosure and "r" gestures. There was no explicittongue gesture for a schwa. This hypothesis wassufficient to generate gestural scores thatproduced speech with the appropriate perceptualproperties, for both "bray" and "beret." In addition,the overlap of the vertical components of their

articulatory trajectories was consistent withtokens of this distinction collected using the X-raymicrobeam system at the University of Wisconsin(Browman & Goldstein, 1990a).

However, in another investigation of reducedsyllables (Browman & Goldstein, in press), dataanalysis and modeling revealed that an explicittongue gesture for a schwa was required inutterances of the form ['pVp;lpVp), although thetarget of the required gesture was completelycolorless in that it was the average of the tonguebody positions for all full vowels for that speaker.Therefore, at the very least, development of amore complete typology of the gestural structureof reduced syllables is needed, and is currentlybeing pursued, to evaluate the phonological andmorphological conditions for schwas of variouskinds, both in English and other languages. Withrespect to deletion processes, however, we shouldnote that even if there is a tongue gestureassociated with a particular schwa, increase inoverlap between consonants on either side of itcould result in hiding that gesture. Thus, even ifan active schwa gesture is required in a word like"difficult," increase in overlap so that thelabiodental fricative and the velar stop partiallyoverlap could result in hiding of this gesture.

In summary, increase in overlap among gesturesin fluent speech is a general gradient process thatcan produce apparent (perceived) discretealternations. The examples above weredescribable as consonant deletions, consonantassimilations, and vowel deletions; anotherpossible example is that of epenthetic stops inEnglish (e.g., Anderson, 1976; Ohala, 1974), asdiscussed in Browman and Goldstein (1990b).However, the fact that stop epenthesis in wordslike "tense" is not found in some dialects ofEnglish (South African: Fourakis, 1980) raises thelarger issue of variability of fluent speechalternations across dialects and languages. Thatis, if the process of increase in overlap is acompletely general property of talking, why does itcreate epenthetic stops in one dialect but notanother? We have suggested (Browman &Goldstein, 1989) that such dialect/languagedifferences may arise from differences in thecanonical patterns of coordination in the differentlanguages. Two kinds of coordination differencesare relevant here. First, languages may differ inthe amount of canonical overlap between twogestures. For example, sequences of stops inEnglish are canonically partially overlapping(Catford, 1977), whereas sequences in Georgian,for example, are canonically non-overlapping, i.e.,

are released stops (Anderson, 1974). We wouldexpect that an amount of increase in overlap thatproduces hiding in English would not necessarilydo so in a language such as Georgian. Second, twogestures may be directly phased with respect toone another in one language, but only indirectlyphased in another language (as discussed inSection 1.2). It is possible that gestures that aredirectly phased will be more likely to retain theircanonical organization in connected speech.

4. DEVELOPMENTAL DATADevelopmental studies show that a child's first

words are stored and retrieved not as phonemesbut as holistic patterns of "articulatory routines"(e.g., Ferguson & Farwell, 1975; Fry, 1966; Locke,1983; Studdert-Kennedy, 1987; Vihman, 1991).Recent research has suggested that the basic unitsof these articulatory routines are discrete gesturesthat emerge pre-linguistically (during babbling),and which can be seen as early "gross" versions ofthe gestures that adults use (e.g., Browman &Goldstein, 1989; Studdert-Kennedy, 1987;Studdert-Kennedy & Goodell, in press). Furtherdevelopment can be viewed as differentiation (interms of parameter values), and coordination ofthese basic gestures. For example, other recentstudies (Fowler, Brady, & Curley, 1991; Nittrouer,Studdert-Kennedy, & McGowan, 1989) haveshown that coordination into segmental-sizedunits (one kind of constellation) only appearsgradually during the course of language acquisi­tion, which not only supports the contention thatphonemes are not present in a child's first words,but also suggests that higher-level units areformed out of smaller units during the course oflanguage development. If so, then articulatoryphonology would provide a very appropriate ap­proach to child language, and its use would facili­tate the study oflanguage development both theo­retically and methodologically, since both childand adult utterances can be described in terms ofthe same basic primitives of gestures.

Fowler, Brady, and Curley (1991) studied exper­imentally induced speech production errors inCVC utterances by children and adults, usingphonetic transcriptions by trained listeners to in­dicate the existence of an error. The purpose ofthestudy was to test the hypothesis that organizationinto phonological structures smaller than the levelof the lexical item only appears gradually duringthe course of language-learning. Fowler et al.found that younger children were much moreprone to blend features in their errors than wereadults, as in the error "bam till" from the utter-

40 Browman and Goldstein

ance "pam dilL" Adults were correspondingly morelikely to retain higher level organization, whethersegmental or subsyllabic, that is to produce theerror "dam pill" from the utterance "pam dill."Thus, in this experiment with single-segment on­sets, onset (or segment) exchanges increased withage (4 & 5-year-olds 33%, 8-year-olds 44%, andadults 74%), while feature blends decreased (4 &5-year-olds 33%, 8-year-olds 18%, and adults 8%).

The Fowler et al. results support the hypothesisthat lexical organization intermediate between thelevels of the feature (or gesture) and the worddevelops as part of learning the language.However, the results do not distinguish between afeatural analysis and a gestural analysis. Anotherstudy, that of Studdert-Kennedy and Goodell (inpress), supports the gesture as the unit out ofwhich words are formed as the child developslanguage. This study focussed on another kind of"error," the differences between the child'spronunciation and the canonical adult one. Theutterances of a child in transition from babble tospeech (91-106 weeks) were recorded. The errorsin these utterances were argued to arise eitherfrom "paradigmatic confusions among similargestures ...or from syntagmatic difficulties incoordinating the gestures that form a particularword" (p. 20).

If gestures originate as pre-linguistic units ofaction, and gradually develop into the units ofcontrast, as argued by Studdert-Kennedy (1987)and Browman and Goldstein (1989), then it ispossible to see a continuity of development inlanguage. If these gestures then serve as theprimitives that are further coordinated in thelanguage-learning process, such continuityincludes higher-level phonological units as well asthe fundamental contrastive units.

REFERENCESAnderson, S. R. (1974). The organization of phonology. New York:

Academic Press.Anderson, S. R. (1976). Nasal consonants and the internal

structure of segments. Language, 52, 326-344.Banner-Inouye, S. (1989). The flap as a contour segment. UCLA

Working Papers in Phonetics, 72, 40-81.Barry, M. (1985). A palatographic study of connected speech

processes. Cambridge Papers in Phonetics and ExperimentalLinguistics, 4.

Beckman, M. E., Edwards, J., & Fletcher, J. (1992). Prosodicstructure and tempo in a sonority model of articulatorydynamics. In D. Docherty & D. R. Ladd (Eds.), Papers inLaboratory Phonology II (pp. 68-86). London: CambridgeUniversity Press.

Bell-Berti, F., & Harris, K. S. (1981). A temporal model of speechproduction. Phonetica, 38, 9-20.

Bell-Berti, F., & Harris, K. S. (1982). Temporal patterns ofcoarticulation: Liprounding. Journal of the Acoustical Society ofAmerica, 71, 449-454.

Bird, S. (1990). Constraint-based phonology. Doctoral dissertation,University of Edinburgh.

Bird, S., & Klein, E. (1990). Phonological events. Journal ofLinguistics, 26, 33-56.

Boyce, S. E. (1990). Coarticulatory organization for lip rounding inTurkish and English. Journal of the Acoustical Society of America,88,2584-2595.

Boyce, S. E., Krakow, R. A., Bell-Berti, F., & Gelfer, C. (1990).Converging sources of evidence for dissecting articulatorymovements into core gestures. Journal of Phonetics, 18, 173-188.

Browman, C. P. (1992). Comments on Chapter 9. In D. Docherty &D. R. Ladd (Eds.), Papers in Laboratory Phonology II (Chapter 9,pp. 257-260). London: Cambridge University Press.

Browman, C. P., & Goldstein, 1. (1985). Dynamic modeling ofphonetic structure. In V. A. Fromkin (Ed.), Phonetic linguistics(pp. 35-53). New York: Academic Press.

Browman, C. P., & Goldstein, 1. (1986). Towards an articulatoryphonology. Plwnology Yearbook, 3, 219-252.

Browman, C. P., & Goldstein, 1. (1988). Some notes on syllablestructure in articulatory phonology. Phonetica, 45, 140-155.

Browman, C. P., & Goldstein, 1. (1989). Articulatory gestures asphonological units. Phonology, 6, 201-251.

Browman, C. P., & Goldstein, 1. (1990a). Gestural specificationusing dynamically-defined articulatory structures. Journal ofPhonetics, 18, 299-320.

Browman, C. P" & Goldstein, 1. (1990b). Tiers in articulatoryphonology, with some implications for casual speech. In J.Kingston & M. E. Beckman, Papers in laboratory phonology I:Between the grammar and physics of speech (pp. 341-376).Cambridge: Cambridge University Press.

Browman, C. P., & Goldstein, 1. (1991). Gestural structures:Distinctiveness, phonological processes, and historical change.In 1. G. Mattingly & M. Studdert-Kennedy (Eds.), Modularityand the Motor Theory of Speech Perception (pp. 313-338). HillsdaleNJ: Lawrence Erlbaum Associates.

Browman, C. P., & Goldstein, 1. (1992). 'Targetless' schwa: Anarticulatory analysis. In D. Docherty & D. R. Ladd (Eds.), Papersin Laboratory Phonology II (pp. 26-56). London: CambridgeUniversity Press.

Brown, G. (1977). Listening to spoken English. London: LongmanGroup Ltd.

Byrd, D. (1992). Perception of assimilation in consonant clusters:A gestural model. Phonetica, 49,1-24.

Catford, J. C. (1977). Fundamental problems in phonetics.Bloomington: Indiana University Press.

Clements, G. N. (1987). Toward a substantive theory of featurespecification. Proceedings of NELS 18, 1,79-89.

Clements, G. N. (in press). Place of articulation in consonants andvowels: A unified theory. In B. Laks & A. Rialland (Eds.),L'Architecture et la geometrie des representations phonologiques.Paris: Editions du C.N.R.S.

Cooper, A. (1991). An articulatory account of aspiration in English.Unpublished doctoral dissertation, Yale University.

Dalby, J. M. (1984). Phonetic structure of fast speech in AmericanEnglish. Unpublished doctoral dissertation, Indiana University.

Diehl, R. (1989). Remarks on Steven's quantal theory of speech.Journal of Phonetics, 17,71-78.

Dixit, R. P. (1987). Mechanisms for voicing and aspiration: Hindiand other languages compared. UCLA Working Papers inPhonetics, 67, 49-102.

Dunn, M. H. (1990). A phonetic study of syllable structure inFinnish and Italian. Paper presented at the 26th meeting ofthe Chicago Linguistic Society, parasession on the syllable,April.

Engstrand, O. (1988). Articulatory correlates of stress andspeaking rate in Swedish VCV utterances, Journal of theAcoustical Society ofAmerica, 83,1863-1875.

Farnetani, E., & Kori, S. (1986). Effects of syllable and wordstructure on segmental durations in spoken Italian. SpeechCommunication,5,17-34.

Ferguson, C. A., & Farwell, C. B. (1975). Words and sounds inearly language acquisition. Language, 51, 419-439.

Fourakis, M. S. (1980). A phonetic study of sonorant-fricativeclusters in two dialects of English. Research Institute in Phonetics,1, 167-200, Indiana University:

Fowler, A., Brady, S., & Curley, S. (1991). The phoneme as anemergent structure: Evidence from speech errors. Paperpresented at the April 1991 SRCD meeting.

Fowler, C. A. (1980). Coarticulation and theories of extrinsictiming control. Journal of Phonetics, 8, 113-133.

Fowler, C. A., Rubin, P., Remez, R. E., & Turvey, M. T. (1980).Implications for speech production of a general theory ofaction. In B. Butterworth (Ed.), Language production. New York:Academic Press.

Fry, D. B. (1966). The development of the phonological system inthe normal and the deaf child. In F. Smith & G. Miller (Eds.),The genesis of language: A psycholinguistic approach (pp. 187-206).Cambridge, MA: MIT Press.

Fujimura, O. (1981). Elementary gestures and temporal organiza­tion-What does an articulatory constraint mean? In T. Myers,J. Laver, & J. Anderson (Eds.), The cognitive representation ofspeech (pp. 101-110). Amsterdam: North-Holland.

Fujimura, 0., & Sawashima, M. (1971). Consonant sequences andlaryngeal control. Annual Bulletin, Research Institute of Logopedicsand Phoniatrics, University of Tokyo, 5,1-6.

Gay, T. (1981). Mechanisms in the control of speech rate.Phonetica, 38,148-158.

Gelfer, C. E., Bell-Berti, F., & Harris, K. S. (1989). Determining theextent of coarticulation: Effects of experimental design. Journalof the Acoustical Society ofAmerica, 86, 2443-2445.

Gimson, A. C. (1962). An introduction to the pronunciation ofEnglish. London: Edward Arnold Publishers, Ltd.

Goldstein, L., & Browman, C. P. (1986). Representation of voicingcontrasts using articulatory gestures. Journal of Phonetics, 14,339-342.

Guy, G. R. (1980). Variation in the group and the individual: Thecase of final stop deletion. In W. Labov (Ed.), Locating languagein time and space (pp. 1-36). New York: Academic Press.

Han, M. (1962). The feature of duration in Japanese. Study ofSounds, 10, 65-80.

Hardcastle, W. J. (1985). Some phonetic and syntactic constraintson lingual coarticulation during /kl/ sequences. SpeechCommunication, 4, 247-263.

Hawkins, S. (1992). An introduction to task dynamics. In G.Docherty & D. R. Ladd (Ed.), Papers in Laboratory Phonology II(pp. 9-25). London: Cambridge University Press.

Hirose, H., & Gay, T. (1972). The activity of the intrinsic laryngealmuscles in voicing control: Electromyographic study. P!lOnetica,25,140-164.

Huffman, M. K. (1990). Implementation of nasal: Timing andarticulatory landmarks. UCLA Working Papers in Phonetics, 75.

Kahn, D. (1976). Syllable-based generalizations in English phonology.Bloomington: University of Indiana Linguistics Club.

Keating, P. A. (1984). A phonetic and phonological representationof stop consonant voicing. Language, 60, 286-319.

Keating, P. A. (1985). CV phonology, experimental phonetics. andcoarticulation. UCLA Working Papers in Phonetics, 62, 1-13.

Keating, P. A. (1990). Phonetic representations in a generativegrammar. Journal of Phonetics, 18, 321-334.

Kelso, J. A. S., V.-Bateson, E., Saltzman, E., & Kay, B. (1985). Aqualitative dynamic analysis of reiterant speech production:Phase portraits, kinematics, and dynamic modeling. Journal ofthe Acoustical Society of America, 77, 266-280.

Kerswill, P. E. (1985). A sociophonetic study of connected speechprocesses in Cambridge English: An outline and some results.Cambridge Papers in Phonetics and Experimental Linguistics. 4.

Kingston, J. (1985). The p!wnetics and phonology of the timing of oraland glottal events. Unpublished doctoral dissertation. Universityof California, Berkeley.

Kingston, J. (1990). Articulatory binding: In J. Kingston & I'll. E.Beckman (Eds.), Papers in laboratory phonology I: Between thegrammar and physics of speech (pp. 406-434). Cambridge:Cambridge University Press.

Krakow, R. A. (1989). The articulatory organization of syllables: Akinematic analysis of labial and velie gestures. Unpublisheddoctoral dissertation, Yale University.

Ladefoged, P. (1982). A course in phonetics (2nd ed.). New York:Harcourt Brace Jovanovich.

Lahiri, A, & Marslen-Wilson, W. (1992). Lexical processing andphonological representation. In D. Docherty & D. R. Ladd(Eds.), Papers in Laboratory Phonology II (pp. 229-254). London:Cambridge University Press.

Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert­Kennedy, M. (1967). Perception of the speech code.Psychological Review, 74,431-461.

Liberman, A. M., & Mattingly, 1. G. (1985). The motor theory ofspeech perception revised. Cognition, 21, 1-36.

Lindblom, B., & Engstrand, O. (1989). In what sense is speechquantal? Journal of Phonetics, 17, 107-121.

Lindblom, B., MacNeilage, P., & Studdert-Kennedy, M. (1983).Self-organizing processes and the explanation of phonologicaluniversals. In B. Butterworth, B. Comrie, & O. Dahl (Eds.),Explanations of linguistic universals (pp. 181-203). Mouton: TheHague.

Lindblom, B., & Maddieson, 1. (1988). Phonetic universals inconsonant systems. In L. M. Hyman & C. N. Li (Eds.), Language,speech. and mind (pp. 62-78). London: Routledge.

liskeI', 1.. (1974). On time and timing in speech. In T. A. Sebeok(Ed.), Current trends in linguistics, Vol. 12 (pp. 2387-2418). TheHague: Mouton.

liskeI', 1.., & Abramson, A. S. (1964). A cross-language study ofvoicing in initial stops: Acoustical measurements. Word, 20,385-422.

liskeI', 1.., & Baer, T. (1984). Laryngeal management at utterance­internal word boundary in American English. Language andSpeech, 27, 163-171.

Locke, J. 1.. (1983). Phonological acquisition and change. New York:Academic Press.

Manuel, S. Y., & V.-Bateson, E. (1988). Oral and glottal gesturesand acoustics of underlying I t/ in English. Journal of theAcoustical Society ofAmerica, 84, S84.

Mattingly, 1. G. (1981). Phonetic representation and speechsynthesis by rule. In T. Myers, J. Laver, & J. Anderson (Eds.),The cognitive representation of speech (pp. 415-420). Amsterdam:North-Holland.

McCarthy, J. J. (1988). Feature geometry and dependency: Areview. Phonetica, 45, 84-108.

Mowrey, R. A., & MacKay, 1. R. A. (1990). Phonologicalprimitives: Electromyographic speech error evidence. Journal ofthe Acoustical Society of America, 88, 1299-1312.

Munhall, K., Lofqvist, A. (1992). Gestural aggregation in speech:laryngeal gestures. Journal of Phonetics, 20, 111-126

Munhall, K. G., Ostry, D. J., & Parush, A. (1985). Characteristics ofvelocity profiles of speech movements. Journal of ExperimentalPsychology: Human Perception and Performance, 11(4),457-474.

Nittrouer, S., Studdert-Kennedy, M., & McGowan, R. (1989).The emergence of phonetic segments: Evidence from the spec­tral structure of fricative-vowel syllables as spoken by childrenand adults. Journal of Speech and Hearing Research, 32, 120-132.

42 Browman and Goldstem

Nolan, F. (1992). The descriptive role of segments: Evidence fromassimilation. In D. Docherty & D. R. Ladd (Eds.), Papers inLaboratory Phonology II (pp. 261-280). London: CambridgeUniversity Press.

Ohala, J. J. (1974). Experimental historical Holland. In J. M.Anderson & C. Jones (Eds.), Histoncal linguistics (pp. 353-389).Amersterdam: North Holland.

Ohman, S. E. G. (1967). Nwnerical model of coarticulation. Journalof the Acoustical Society of America, 41, 310-320.

Padgett, J. (1991). Stricture in feature geometry. Unpublisheddoctoral dissertation, University of Massachusetts, Amherst.

Pierrehumbert, J. (1990). Phonological and phoneticrepresentation. Journal of Phonetics, 18, 375-394.

Pierrehumbert, J., & Talkin, D. (1992). Lenition of Ihl and glottalstop. In D. R. Docherty & D. Ladd (Eds.), Papers in LaboratoryPlwnology II (pp. 90-117). London: Cambridge University Press.

Port, R. F., Dalby, J., & O'Dell, M. (1987). Evidence for moratiming in Japanese. Journal of the Acoustical Society of America.81, 1574-1585.

Sagey, E. C. (1986). The representation offeatures and relations in non­linear phonology. Unpublished doctoral dissertation, MIT.

Saltzman, E. (1986). Task dynamic coordination of the speecharticulators: A preliminary model. In H. Heuer & C. Fromm(Eds.), Experimental Brain Research Series 15 (pp. 129-144). NewYork: Springer-Verlag.

Saltzman, E., & Kelso, J. A. S. (1987). Skilled actions: A taskdynamic approach. Psychological Review, 94, 84-106.

Saltzman, E. L., & Munhall, K. G. (1989). A dynamical approach togestural patterning in speech production. Ecological Psychology,1,333-382.

Schiefer, L. (1989). 'Voiced aspirated' or 'breathy voiced' and thecase for articulatory phonology. Forschungsberichte des Institutsfur Phonetick und Sprachliche Kommunikation der UniversitatMiinchen, 27, 257-278.

Smith, C. (1988). A cross-linguistic contrast in consonant andvowel timing. Journal of the Acoustical Society ofAmerica, 86, S84.

Smith, C. (1991). The timing of vowel and consonant gestures inItalian and Japanese. Paper presented at the 12th InternationalCongress of Phonetic Sciences, Aix-en-Provence, France,August 19-24.

Smith, c., Browman, C. P., McGowan, R.. & Kay, B. (submitted).Extracting dynamic parameters from speech movement data.

Sproat, R.. & Fujimura, O. (1989). Articulatory evidence for thenon- categoricalness of English III allophones. Paperpresented at the LSA annual meeting. Washington. DC,December.

Sproat, R.. & Fujimura. O. (submitted). Allophonic variation inEnglishiliand its implications for phonetic implementation.

Steriade, D. (1987). Redundant values. CLS. 23. 339-363.Stevens, K. N. (1989). On the quantal nature of speech. Journal of

Plwnetics, 17. 345.Stevens, K. N. (in press). Phonetic evidence for hierarchies of

features. In P. A. Keating (Ed.), Papers in Laboratory PhonologyIII. Cambridge: Cambridge University Press.

Studdert-Kennedy, M. (1987). The phoneme as a perceptuomotorstructure. In A. Allport, D. MacKay, W. Prinz, & E. Scheerer(Eds.), Language perception and production (pp. 67-84). London:Academic Press.

Studdert-Kennedy, M., & Goodell. E. W. ( in press). Gestures,features and segments in early child speech.In B. de Gelder & J.Morais (Eds.), Language and literacy: Comparative approaches.Cambridge MA: MIT Press.

Turvey, M. T. (1977). Preliminaries to a theory of action withreference to vision. In R. Shaw & J. Bransford (Eds.), Perceiving,acting and knowing: Toward an ecological psychology. Hillsdale, NJ:Lawrence Erlbawn Associates.

Vihman, M. M. (1991). Ontogeny of phonetic gestures: Speechproduction. In 1. Mattingly & M. Studdert-Kennedy (Eds.),Modularity and the Motor Theory of Speech Perception (pp. 69-84).Hillsdale, NJ: Lawrence Erlbawn.

Yoshioka, H., L6fqvist, A., & Hirose, H. (1981). Laryngealadjustments in the production of consonant clusters andgeminates in American English. Journal of the Acoustical SocietyofAmerica, 70, 1615-1623.

Zsiga. E. C. (1993). Gradient rules in phonology and phonetics.Unpublished doctoral dissertation, Yale University.

Zsiga, E. c., & Byrd. D. (1990). Acoustic evidence for gesturaloverlap in consonant sequences. Journal of the Acoustical Societyof America. 88. S82.

FOOTNOTES'Phonetica, 49,155-180 (1992).tAlso Department of Linguistics, Yale University.