Sequences and segments: phonemic status and...

21
Sequences and segments: phonemic status and gestural timing in English Adam Albright March 19, 2000 1 Introduction: sequences and segments With the advent of theories like Articulatory Phonology (e.g. Browman and Goldstein 1992), it is increasingly popular to assume that the set of phonological primitives includes not only events (either acoustic or articu- latory), but also the timing relations between these events. If the timing relationship between events can be specified in the lexicon, however, then this provides one more way in which languages can differ arbitrarily from each other. As counterintuitive as this might be, there does seem to be some empirical support for it. To pick just one example, work by Smith (1993) on the timing relations between consonants and vowels in Italian and Japanese shows that in fact languages may differ in whether vowels are anchored to consonants or to other vowels. This seems like an alarming amount of de- tailed, language-specific variation to allow – especially if we consider that these relations must be learned somehow. How much articulatory detail do children need to learn explicitly in order to acquire their languages? Why would children bother to learn such minute details at all, when we might have thought a priori that these could perfectly well be filled in by some universal default mechanism for concatenating speech segments? Byrd (1994) suggests a possible explanation for some low-level timing differences between languages. She points out that consistent timing rela- tions could be a reflex of “segmenthood” in the language (p.154). If this is true, then perhaps children could learn segmenthood through regular distri- butional means, and the articulatory differences between languages merely reflect the differences between implementing a segment and implementing a sequence. As an example, Byrd considers a velar stop gesture and a labial stop gesture, which could be combined to create a single phonological segment 1

Transcript of Sequences and segments: phonemic status and...

Sequences and segments: phonemic status and

gestural timing in English

Adam Albright

March 19, 2000

1 Introduction: sequences and segments

With the advent of theories like Articulatory Phonology (e.g. Browmanand Goldstein 1992), it is increasingly popular to assume that the set ofphonological primitives includes not only events (either acoustic or articu-latory), but also the timing relations between these events. If the timingrelationship between events can be specified in the lexicon, however, thenthis provides one more way in which languages can differ arbitrarily fromeach other. As counterintuitive as this might be, there does seem to be someempirical support for it. To pick just one example, work by Smith (1993) onthe timing relations between consonants and vowels in Italian and Japaneseshows that in fact languages may differ in whether vowels are anchored toconsonants or to other vowels. This seems like an alarming amount of de-tailed, language-specific variation to allow – especially if we consider thatthese relations must be learned somehow. How much articulatory detail dochildren need to learn explicitly in order to acquire their languages? Whywould children bother to learn such minute details at all, when we mighthave thought a priori that these could perfectly well be filled in by someuniversal default mechanism for concatenating speech segments?

Byrd (1994) suggests a possible explanation for some low-level timingdifferences between languages. She points out that consistent timing rela-tions could be a reflex of “segmenthood” in the language (p.154). If this istrue, then perhaps children could learn segmenthood through regular distri-butional means, and the articulatory differences between languages merelyreflect the differences between implementing a segment and implementing asequence.

As an example, Byrd considers a velar stop gesture and a labial stopgesture, which could be combined to create a single phonological segment

1

(the doubly articulated [>

kp] phoneme), or could be combined to create asequence (the sequence [kp]). Under this approach, the single phonemewould combine the two gestures in a stable timing relation across all speechrates, whereas the sequence would have more variable timing across differentspeech rates. Maddieson (1993) presents EMA data from Ewe that the firsthalf of this is true – namely, that labial-velar stops are indeed composed ofotherwise “canonical” [k] and [p] gestures, and that these gestures stand ina particular timing relation to each other to form the

>

kp phoneme.But what about languages in which [kp] forms mere sequences, such as

English? Byrd suggests that in these cases, the [k] and [p] gestures shouldhave a more variable timing relation. However, her other experiments ontongue body gestures show interesting evidence that even sequences have rel-atively constant timing relations when they are part of the same morpheme.The constancy of timing between tautomorphemic gestures is shown evenmore clearly by Cho (1999). So what is the prediction, exactly? If a [kp]sequence is tautomorphemic, should it have a stable timing relation becausethe whole morpheme is specified for such relations? or should it have avariable timing relation because it is not a single phoneme, but rather theconcatenation of two phonemes? Perhaps it has a stable timing relation,but the degree of overlap is not so great in a sequence as it is in a simplexphoneme? Or perhaps there is actually a three- (or more)-way stabilitydistinction, with single phonemes at one end and sequences across sentenceboundaries at the other end? Although Byrd’s suggestion about the differ-ence between sequences and complex segments is appealing, the details arenot at all obvious.

The present study, therefore, compares the gestural timing of sequenceslike [kp] with the timing of complex phonemes such as the [

>

kp]. Since Mad-dieson (1993) already contains a relatively detailed account of the complex[>

kp] phoneme in Ewe, the logical case to investigate is the non-phonemic[kp] sequence of English. The underlying goal here is to understand howphoneme inventory interacts with intergestural timing, so that we can viewthem as interrelated instead of as two independent facts for children to learn.

2 Hypotheses

Since this study is an explicit comparison with Maddieson (1993), it is usefulto use his hypotheses for Ewe as a starting point. There are several factorswhich make the comparison between English and Ewe imperfect, however.Most importantly, the fact that [

>

kp] is phonemic in Ewe but not in English

2

means that its syllabification differs across the two languages. In Ewe, aword like [a

>

kpa] has a syllable break before the [>

kp]: [a.>

kpa]. In English,on the other hand, syllable breaks always come between two stops: [ak.pa].Since, as Byrd (1994) and Maddieson (1993) both point out, syllable positionmay play an important role in shaping articulatory patterns, we should notnecessarily expect the word-internal [k] gestures in English [aka] and [akpa]to be identical. Instead, we must compare the [k] in [akpa] with a singlecoda [k], such as before a word boundary: [ak# a]. This is not a problemparticularly, because this study is not designed to test the more generalquestion of whether [k]s and [p]s are articulated the same in English andEwe. Nevertheless, it does add one more factor which must be controlledfor in constructing the test materials for English.

With these caveats in mind, we can test the following hypotheses forEnglish:

1. Just as in Ewe, the velar and labial gestures in English [kp] sequencesare identical in time-course and magnitude to the single gestures insimplex [k] and simplex [p] (when syllable position is controlled for)

2. Unlike Ewe, the overlap (in msec) of these two gestures in Englishvaries as a function of the rate of speech. (This variability in the degreeof overlap is what Byrd suggests should act as a cue for sequence, orheterosegmental status in English)

3. The degree of overlap determines the overall duration of the sequence(i.e. the intrinsic durations of the gestures themselves do not vary)

4. The degree of overlap in tautomorphemic [kp] sequences should becomparable to the degree of overlap in other tautomorphemic sequences,such as [kt] or [pk]. (Not tested in Maddieson 1993)

3 Materials

3.1 Corpus

In addition to [kp] sequences, the corpus also consisted of words with simple[k] and [p] phonemes, in order to test the hypothesis that the [kp] sequence isreally composed of two simple gestures. Words with [kt] and [pk] sequenceswere also included, in order to compare the degree of overlap in differentsequences. Sequences like [kp] can not occur word-initially or word-finallyin English, so all sequences and phonemes of interest were intervocalic. In

3

fact, English has very few words with [kp] sequences: backpack, crockpot,cockpit, chickpea, stickpin, and jackpot are the most common ones.1

Since all of the [kp] words are bisyllabic with initial stress, this formedthe template for all of the other sequences. For each vocalic context, Iattempted to find words with all of the possible stop phonemes/sequencesin intervocalic position. (Not all slots could be filled for all vocalic contexts,but each stop appeared in enough vocalic contexts to allow for all possiblepairwise comparisons.) Finally, as discussed above, the [k] in [kp] is actuallyin the coda position of the first syllable, so I used monosyllables followed bycliticized pronouns for [k] – for example, cockpit was compared with rock it,not with rocket.

The resulting word list is given in Table 1. (The transcription [O] reflectsthe author’s pronunciation of these items.)

Table 1: Word list for all consonant and vowel combinationscontext: i i æ O O O O i i i æ æ

[p] hippie snap-on top on poppet Pippin hapax[k] tricky tack-on rock on rock it stickin’ back at

[kp] chickpea jackpot crockpot cockpit stickpin backpack[kt] blacktop Choctaw octave victim[pk] Kripke Epcot popcorn Hopkins pipkin[kk] stock car blackcap

The test items were put in the short and unoriginal carrier sentence “Sayagain.” The entire list was recorded 10 times at a normal speech rate,

and then 10 times at a fast speech rate. Speech rate was not controlled inany external way; however, the mechanics of data collection with the EMA

1A legitimate concern is that all of these words are actually compounds, so the [kp]sequences are not strictly tautomorphemic. In all of these cases, however, the compoundis highly lexicalized. There are two possible scenarios here; the strongest interpretation ofArticulatory Phonology should say that if a word is lexicalized, then it has its own scorespecified. A weaker version of this would be that the lexical entries for lexicalized com-pounds are somehow incomplete, and only contain “higher level” information about themeaning of the compound, with pointers to the gestural scores of the individual subpartsand deriving the entire compound by morpheme concatenation. If the second is true, thenvariable overlap between the segments would be ambiguous between an effect of their het-erosegmental status, their heterosyllabic status, and their heteromorphemic status. Theother sequences like [kt] can act as a control here, since if the morpheme boundary inthe compound turns out to be important, then the variability for [kp] sequences would begreater than that for [kt] sequences.

4

system require frequent interruptions to save data. These provided a sort of“reset” every few sentences.

3.2 Set-up

Measurements were taken using the Carstens Articulograph AG100 system,with 5 receiving coils placed as follows:

1. Upper Lip

2. Lower Lip

3. Tongue tip (1cm back)

4. Tongue body (4.3cm back)

5. Bridge of the nose, as a reference.

These coil locations are essential the same as those used by Maddieson(1993), in order to facilitate comparison. The data were recorded at a sam-pling rate of 312.5 Hz. The results presented here are from only one speaker(namely, the author).

3.3 Measurements

Durational and articulatory measurements were taken using the CarstensEmalyse software. Stop durations were measured from a zoomed waveformdisplay. For articulations, six measurements were made for the relevant ar-ticulator: time and Y position (e.g., tongue height) for the gesture onset,gesture maximum, and gesture offset. The onset is defined as the turnaroundpoint where the articulator begins its ascent, and the offset is the correspond-ing turnaround at the end of the gesture. These three points are illustratedin Figure 1.

In cases where there was ambiguity as to the exact onset, maximum,or offset, zero crossings in the velocity (i.e. the first derivative of the po-sition) were used to decide the exact point. Since vowels and consonantsmay involve similar gestures (e.g. dorsal raising in both [i] and [k]), it cansometimes be difficult to isolate the gestures for one particular phoneme.Therefore, this study only reports the measurements from the vocalic con-text [O O], since low vowels are maximally different from the dorsal conso-nant [k]. In this context, there was amazingly little ambiguity surroundingthe exact onset, peak, and offset of gestures.

5

gestureonset

gesturepeak

gestureoffset

Time

Figure 1: Measurement points

4 Results

4.1 Consonant closure durations

Although not directly related to any of the hypotheses listed above, it isuseful to start out with an overview of the acoustic durations, partly toshow that they they are typical values, and partly also so we know where tolook for articulatory differences in intrinsic gesture duration or in the degreeof overlap.

The average acoustic durations for all consonants and sequences at nor-mal and fast speech rates are give in Table 2.

Table 2: Acoustic durations at two speech rates/k/ /p/ /kp/ /pk/ /kt/ /kk/

normal 91.4 88.1 145.8 157.0 135.4 158.8fast 76.4 72.3 114.8 127.4 104.4 119.2

avg. diff. 84% 82% 79% 81% 77% 75%

Unsurprisingly, the sequences are uniformly longer than the single stops, andall stops and sequences are shorter in fast speech than in normal speech. ABonferroni/Dunn multi-way comparison for all stops and sequences at thenormal speech rate shows that acoustic duration only varies significantly(p < .0001) with the number of segments; /k/ is not significantly longer orshorter than /p/, and no sequence is significantly shorter than any othersequence; singleton /k/ and /p/ are both significantly shorter than all of

6

Word duration (ms)

Sequ

ence

dur

atio

n (m

s)

40

60

80

100

120

140

160

180

200

220

350 400 450 500 550 600

pkktkp

Figure 2: Word duration vs. sequence duration (in msec)

the sequences.Another interesting question to test regarding speaking rate and closure

duration is whether the duration of stops and stop sequences varies in di-rect proportion to the overall duration of the word, or whether they “resistcompression.” In other words, is a faster rate obtained by shortening all seg-ments, or just some segments, such as the vowels? In Figure 2, the (acoustic)stop duration of the three sequences are plotted as a function of the overallword length. Overall, there is fairly good correlation between word durationand stop duration (r(30) = .704). (A perfect correlation (r=1) would meanthat the sequences are shortened in exact proportion to the word duration,while a zero correlation would mean that the shortening is exclusively onother segments.)

Thus, we see that speech rate does have some effect on the timing ofCC sequences in English. It is worth enumerating the logical possibilitiesfor how these sequences are timed, although they should be fairly obvious.First, the fact that the sequences are not twice as long as single stops meansthat the component gestures must either be shorter or overlapped, or both.Second, the observed durational decrease in fast speech must be a resulteither of gestural compression or increased overlap between the gestures, ora combination of the two.

It is also worth comparing these durations to those reported by Mad-

7

dieson (1993) for Ewe. In all cases, the acoustic closure durations for English(normal and slow) speech are considerably shorter than those for Ewe, wherethe speech was possibly quite slow and careful. In fact, the durations of theEnglish sequences (130-150ms) are comparable to the values for singletonconsonants in Ewe. However, we can also observe that the sequences are pro-portionally longer in English (around 150% of the singleton stops, comparedto about 120% for Ewe). It is difficult to know how English and Ewe trulycompare, however, since the data most likely comes from different speechrates.

4.2 Constancy of gestures

The first hypothesis concerns the constancy of gestures across environments.This hypothesis takes the view that gestures in sequences are not differentfrom the equivalent singleton gestures, and the reason why sequences are nottwice as long acoustically is because of overlap and not gestural compression.However, before we can compare occurrences of a [k] in one environmentwith occurrences of [k] in a different environment, it is useful to know howconsistent gestures are within a particular environment. (It is important toremember, however, that in this case ‘within an environment’ really means‘across utterances of a single word’, since the corpus only contained one wordfor each consonant and vowel combination.) Figure 3 shows ten occurrencesof the consonant [k] as a singleton consonant in the phrase ‘rock on’, andten occurrences of [k] as the first element of the sequence [kp] in the word‘crockpot’.

8

-500

0

500

1000

Ton

gue

hei

ght

0

100

200

300

Time (ms)

(a) /k/ in ‘rock on’

-1000

-500

0

500

1000

Ton

gue

heig

ht

0

10

0

20

0

30

0

Time (ms)

(b) /k/ in ‘crockpot’

Figure 3: 10 /k/ gestures alone (left) and in a /kp/ sequence (right)

Although the two gestures appear to differ somewhat in magnitude andtime course, do they also differ in token-to-token variability? An Equalityof Variance F-Test comparing the time of the gestural maximum, the heightof the gestural maximum, the time of the gestural offset, and the height ofthe gestural offset for all gestures in all environments showed no significantdifferences in the within-group variance for any set of tokens. In otherwords, all of the gestures discussed here were pronounced with roughly thesame consistency from token to token.2 Therefore, we can now ask for eachgesture what the effect of context is.

4.2.1 /k/ gestures

Because the variance is equal across all different environments, it is possibleto average all the tokens for each environment in order to compare themvisually. Figure 4 shows the average /k/ gestures in three different contexts:as a single stop (/k/), as the first component of a /kt/ sequence, and as thefirst component of a /kp/ sequence.

2This would be an interesting thing to test across a broader range of environments,however. It seems quite likely that coarticulation could influence gestural consistency sothat some environments are less “stable” than others.

9

-1000

-500

0

500

1000

1500

Ton

gue

hei

ght

0 50

100

150

200

250

Time (ms)

singleton /k/

/k/ in /kt/

/k/ in /kp/

Figure 4: /k/ in 3 consonantal contexts

The fact that the /k/ in /kt/ is actually slightly longer on average isinteresting, because acoustically /kt/ tends to be the shortest sequence.This reinforces the idea that the acoustic duration of sequences is more afunction of gestural overlap than of gestural compression.

The numerical comparisons of gestures in various environments can differin four ways: the time of their gestural peaks, the height of their gesturalpeaks, the time of their gestural offsets, and the height of their gesturaloffsets. For /k/, the time of the gestural peak for a single /k/ is significantlylater than that for a /k/ in a /kp/ sequence (Mean diff. = 32.3 ms, p<.0001),and it is significantly earlier than that for a /k/ in a /kt/ sequence (Meandiff. = -23.0 ms, p<.0001). This confirms the visual impression from Figure4 that the peaks are timed differently for the three contexts. (A singleton/k/ also peaks significantly earlier than the /k/ in a /pk/ sequence (Meandiff. = -71.4, p<.0001), but this is to be expected because the two are indifferent syllabic positions.) It should also be reiterated that the /k/ in /kp/has a significantly earlier peak that that in /kt/ (Mean diff. = -55.4 ms,p<.0001), which is unexpected because they are both in coda position intwo-segment sequences. One’s first inclination might be to wonder whether

10

it is an accident of too few tokens (10) in each condition.3 However, all ofthese differences are also observable and significant at the faster speech rate,so it can not be purely accidental. In fact, the answer is probably relatedto the relative tongue height differences which are also shown in Figure 4,which are also all statistically significant (p<.0001).

It may be somewhat puzzling to think about tongue height differencesfor velar stops, since presumably there is always complete contact and thetongue can go no farther than the palate. We must remember, however,that the Y Position really only tells us the height of one particular pointon the tongue, so in this case it is indirectly reflecting the frontness orbackness of the /k/ closure. The tongue is farther forward and therefore thedorsal pellet is going higher into the arch of the palate in /kt/ sequences,where the /t/ gesture is pulling the tongue somewhat forward. On the otherhand, the /k/ in /kp/ must have a more posterior, and therefore lower,closure than a plain /k/. I take this to confirm Maddieson’s observationthat plain /k/ is made with a looping forward gesture, while the /k/ ina /

>

kp/ phoneme does not have such a forward component. The lack ofa velar loop in [kp] sequences can be explained by different aerodynamicpressures in this context: the simultaneous [k] and [p] closures mean thatair is trapped between the two constriction points. Forward-looping tonguemovement would cause compression, and is thus discouraged. (For morediscussion of the aerodynamic pressures on articulatory loops, see Hoole,Munhall, and Mooshammer 1998.)

This height difference also partly explains the differences in timing of thegestures: in a /kt/ sequence, the tongue begins to rise but then gets pulledforward by the incipient /t/ gesture. Thus the point on the tongue is notmoving straight up, but is rather moving diagonally to a point on the palatewhich is anyway higher. Therefore the /k/ in a /kt/ sequence will havea later and higher peak than simple /k/. For a /kp/, on the other hand,there is absolutely no forward motion – not even the forward motion of anarticulatory loop. Therefore the peak is earliest in this context, becausethere is no competing force.4

3One possible confound is that the conditions were read in the same order for all tokens,so if there was some general trend in speech rate from the beginning to the end of the list,that could mean that [kp] was always particularly fast and [kt] was always especially slow.In fact I sort of doubt this, because the mechanics of the Carstens software require thatthe list be broken up into many little pieces, so there are frequent pauses and “resets”which would disrupt all but very local trends.

4This only works if we assume that the forward motion of loops actually begins beforefull closure (or during it), since if it only starts after the release, then the gestural peakwould not be affected.

11

The next question we can ask about the relative timecourse of the /k/gestures is about their overall duration. Do earlier peaks also correspond toearlier endings? For /k/ in /kt/ sequences, the answer is straightforward:not only is the peak later than for a simple /k/, but so is the offset (Meandiff. = -57.0 ms, p<.0001). For /k/ in /kp/ sequences, however, the answeris surprising: although the peak is earlier than that of a simple /k/, its offsetis at the same time (Mean diff. = -.6 ms for normal speech, -2.2 ms for fastspeech). I have no real explanation for this fact, and indeed I am surprisedthat it holds across both speech rates. I can only guess that this meansthat full closure of /k/ in this environment is longer than the closure of asimple /k/. This makes sense if we assume that the target height is fixedsomewhere above the velum, but the forward motion of the loop means thatthe tongue is moving into a slightly taller section of the mouth and thereforethe stop duration is slightly shorter. At any rate, these results do confirmMaddieson’s observation for Ewe that within the same speech rate, gesturesin a [kp] sequence are not pronounced more quickly in order to achieve a netclosure which is less than twice that of a singleton stop.

Finally, we can also compare the /k/ gestures in a single stop and a“geminated” /kk/ sequence, as in the phrase ‘stock car’. In the case of/kk/, we do not appear to have two distinct /k/ gestures, but rather onesingle gesture which is more extreme. Figure 5 shows the difference betweenthe single and “doubled” /k/ gestures.

-1000

-500

0

500

1000

1500

Ton

gue

heig

ht

0 50

10

0

15

0

20

0

25

0

Time (ms)

/kk/

/k/

Figure 5: single /k/ vs. doubled /kk/

Why does the /kk/ end considerably higher than the single /k/? Onepossibility is that the /kk/ has more forward looping (because it is longer?)

12

and therefore achieves its highest point later, at a higher, more anteriorposition on the palate. Furthermore, there may also be a confounding effectof /r/’s in the neighboring environments; in the single /k/, there was an/r/ in the onset preceding the /k/ (‘rock on’), while for the doubled /k/there was an /r/ in the following coda (‘stock car’). We can guess that thecoarticulatory effect of the /r/ could be to raise the tongue on one side of thegesture – in this case the beginning of the /k/ and the end of the /kk/. Sincethe graph normalizes the initial tongue height for the two gestures, theseheight differences are consolidated into the apparent final tongue position.Therefore, what looks like a substantial difference in tongue height at theend of these gestures is in fact probably no more than the accumulation ofexternal coarticulatory effects.

So in summary: for /k/ gestures, there seems to be a fair amount ofcontext-dependent articulatory variation. (I will call this variation allotropy,for want of a better term, and refer to individual contextual variants of agesture as allotropes.) We know that this is strictly due to the context andnot to some random imprecision of the tongue body, because the variancefor each /k/ allotrope is no greater than that for any other articulator.This is not surprising – after all, /k/ is extremely susceptible to context-dependent allomorphy and historical change. The implications on gesturesare as follows: before a /t/, the tongue is pulled forward and the /k/ closureis produced with the same point on the tongue, but at a more anterior pointon the palate. This means that the tongue has farther and higher to go, andtherefore the peak and offset are later in this context. Before /p/, on theother hand, the same point on the tongue actually makes the /k/ closurefarther back than for a regular /k/, so the gestural peak comes earlier.I hypothesize that since the target for this gesture is actually above thepalate, this early peak does not translate into a shorter overall gesture, butrather means that more time is spent slammed into the root of the mouth.Therefore the overall duration for the pre-/p/ allotrope is no longer thanthat of a single /k/.

4.2.2 /p/ gestures

Although the details of /k/ gestures are highly dependent on context, thedetails of /p/ gestures are not. There were no significant differences inthe timing or magnitude of /p/ gestures as a function of context, exceptfor a significant durational difference between onset and coda realizationsof /p/. Since there were no rounded vowels in the context, there are noother segments making demands on the lips, and therefore no noticeable

13

coarticulation.

4.2.3 Conclusions for Hypothesis 1

So are the gestures in sequences the same as the gestures as single stops?Sure, if by the same we mean different. In this section I have tried to showthat underlying phonemes are always realized with a reasonable contextual-ized version (allotrope) of some canonical movement, but this does not reallyconfirm Hypothesis 1 that the gestures maintain a uniform time course andduration. In point of fact, Maddieson’s observations about looped vs. non-looped /k/ gestures shows that this hypothesis was not exactly true of hisdata, either. A more refined statement of his findings would be that thedorsal gesture in a /

>

kp/ phoneme is the contextually appropriate allotropeof a singleton /k/. The fact that the /k/ in /

>

kp/ phonemes happened tobe extremely similar to isolated /k/’s was mostly an accident of the factthat /p/ does not cause much coarticulation on the /k/. In general, how-ever, this “recycling of gestures” hypothesis can be much more difficult totest, because it is not clear how much physical variation should be toleratedin considering whether two gestures are “identical.” Indeed, I surmise itis often the case that allotropic gestures in one language are phonemic inanother.

What should we conclude from this failure to find a constant charac-terization of /k/ across different contexts? One possible problem is thatEmalyse only allows measurement of movements in one direction (in the Xor Y plane), when perhaps measuring Euclidean distance (

√x2 + y2) could

reveal a constant profile for /k/ in different environments. Informal inspec-tion of the data using tangential velocity (the first derivative of Euclideandistance) instead of simple Y position indicates that the results would prob-ably not be drastically different when measured in this way, however.

A more fundamental problem could be that we are simply misusing Ar-ticulatory Phonology when we look at physical movements of articulators,and that in fact constancy would only emerge if we gathered neuromusculardata to reveal more abstract control gestures (Gestures). If this is true, thenlearning would have to proceed in a completely different way – for exam-ple, by massive trial and error until the right neurological signal is foundto produce the correct results in all different segmental contexts (Guenther1995).

14

0

10

20

30

40

50

60

Fast NormalSpeech rate

pk

kt

kp

Off

set b

etw

een

gest

ural

pea

ks (m

s)

Figure 6: Effect of speech rate and cluster type on degree of overlap

4.3 Overlap of gestures

The second hypothesis concerns the degree of overlap between the gestures.If Byrd is correct that what makes /

>

kp/ phonemic in Ewe is a constancy ofoverlap across different speech rates, then for English we predict that thedegree of overlap should vary with speech rate. In order to test this, theoverlap for each sequence was defined as the time (in milliseconds) betweenthe peak of the first gesture and the peak of the second gesture. A two-wayANOVA was performed, using speech rate and cluster identity (/kp/, /kt/,or /pk/) as factors, to test whether speech rate had an effect on the degreeof overlap. The largest main effect was of cluster identity: /kt/ sequencesare significantly more overlapped than either /kp/ or /pk/ clusters (F(1)= 66.27, p < .0001). However, a significant main effect of speech rate wasalso observed (F(1) = 9.23, p < .01). There was no significant interactionbetween speech rate and cluster identity, meaning that the degree of overlapwas not adjusted differently for different clusters. The relative size of themain effects for cluster identity and speech rate can be seen in Figure 6 –the effect of cluster identity is much bigger.

This confirms the second hypothesis, that the degree of overlap betweenstops in a cluster varies depending on speech rate.

15

4.4 Intrinsic durations of gestures

The third hypothesis concerns the fact that sequences are not as long acous-tically as the sum of the single component stops. The hypothesis is that inEnglish, just as in Ewe, this shortening is accomplished purely by gesturaloverlap, and not by shortening the individual gestures. In other words, wepredict that there should be no effect of speech rate on the intrinsic durationof gestures.

The mean gestural durations (i.e., the time between the onset and theoffset) for each consonant in each speech rate are shown in Figure 7; Fisher’sPLSD post-hoc comparisons show that the small differences in gestural du-ration are in fact significant to some extent for all places ([k]: p < .01; [p]:p < .0001; [t]: p < .05).

0

50

100

150

200

250

300

NormalSpeech rate

Fast

Mea

n ge

stur

e d

urat

ion

(ms)

(a) [k] durations

0

25

50

75

100

125

150

175

200

225

NormalSpeech rate

Fast

Mea

n ge

stur

e d

urat

ion

(ms)

(b) [p] durations

0

50

100

150

200

250

300

NormalSpeech rate

Fast

Mea

n ge

stur

e d

urat

ion

(ms)

(c) [t] durations

Figure 7: Mean durations of stop gestures in fast and normal speech

This disconfirms the third hypothesis, that individual gestures are notcompressed to achieve shorter clusters, leaving all the work to increasedoverlap.

4.5 Similarity of sequences

The final hypothesis is that the degree of overlap for /kp/ sequences in En-glish should not differ at any speech rate from the degree of overlap forother sequences. In order to test this, Bonferroni/Dunn multiway compar-isons were performed between all different sequences, to test the hypothesisthat the peak-to-peak duration is the same for all sequences. In fact, thereare several significant differences in the degree of overlap. First, as was alsoobserved above in the ANOVA for speech rate, the distance between peaksis significantly longer for /kp/ and /pk/ than for /kt/ (Mean diff. = 40.2 msfor /kp/, 37.8 ms for /pk/, p<.0001). This seems a little bit counterintuitive,because we might expect that having to share an articulator would make

16

the offset greater, not less. However, if my conjectures above are correctabout the /t/ gesture pulling the tongue forward mid-/k/, then the effect ofthe /t/ would be to delay the /k/ and make the peaks closer together. Thisdifference is shown in Figure 8.

0 50

100

150

200

250

Time (ms)

/k/ /p/

Art

icul

ator

dis

plac

emen

t

(a) gestural timing in /kp/(peak displacement ≈ 50 ms)

0

100

200

300

Time (ms)

/k/ /t/

Art

icul

ator

dis

plac

emen

t

(b) gestural timing in /kt/(peaks nearly simultaneous

Figure 8: Idealized coordination in sequences

The remaining sequences (/kp/ and /pk/) do not differ significantly fromeach other with respect to overlap at either speech rate.

5 Discussion

Maddieson (1993) compared /k/ gestures across two different phonemesand found a certain amount of constancy in spite of the fact that differentphonemes were involved. In this study, on the other hand, I have compared/k/ gestures which are all the same underlying phoneme, yet exhibit a fairamount of variation from context to context.

The motivation of this study was to test whether in English, as in Ewe,sequences of stops are nothing more than two stop gestures in a particu-lar timing relation. The results in section 4.2 showed that the version of/k/ which appears in [akpa] is not exactly the same as the version of /k/which appears in [akta] or simple [aka]. However, the differences which areobserved (a lower and earlier gestural peak) may be explainable as simplecoarticulation, caused by the lack of articulatory loops when there is a si-multaneous [p] closure. Furthermore, this difference (lack of loops) was also

17

observed by Maddieson in the Ewe data. Therefore, I believe it is possibleto conclude at least that the “gestural recycling” hypothesis is as true inEnglish as it is in Ewe. (Note that the opposite conclusion would be verysurprising – that Ewe uses the same [k] gesture in two separate phonemes,while English has two different gestures to express sub-allophonic variantsof the same phoneme.)

The remaining hypotheses were a test of Byrd’s hypothesis that thetiming relation between gestures should be more stable when they comprisea single segment than when they are heterosegmental. In section 4.3, itwas shown that the degree of overlap does vary depending on speech rate.This is not the whole story however; in section 4.4 it was shown that speechrate also affects the intrinsic duration of gestures. This result is not terriblysurprising; after all, the duration values in 4.1 showed that singleton stopsare also shorter in faster speech, and in this case the entire effect must bedue to reducing the duration of the gesture.

The picture which emerges, therefore, is one of two cooperating factors.Consonants are reduced in duration in fast speech in English, and this can beaccomplished by shortening the stop gestures – in fact, this is the only meansavailable for singleton stops. When multiple stops are involved, however, theoverlap between them is also increased (i.e., their peaks are brought closertogether).

The cooperation of these two factors (increased overlap and faster move-ments) can be demonstrated further by constructing a model with multipleregression. The graphs in Figure 9 show that taken independently, the de-gree of overlap and the speed of the first gesture (i.e., time from onset topeak) are only mediocre predictors of the entire acoustic duration (r = .444and r = .184, respectively). When the two effects are combined, however,the predictions for acoustic duration are much better (r = .753).

18

100

110

120

130

140

150

160

170

180

190

200

-20 -10 0 10 20 30 40 50 60 70

Gestural overlap (ms)

Aco

ustic

dur

atio

n (m

s)

(a) Effect of overlap

100

110

120

130

140

150

160

170

180

190

200

30 40 50 60 70 80 90 100 110 130120

Segment 1 onset-to-peak (ms)

Aco

ustic

dur

atio

n (m

s)

(b) Effect of faster movement

Figure 9: Independent effects of overlap and gestural compression

100

110

120

130

140

150

160

170

180

190

200

100 110 130 150 160 180 200120 140 170 190

Predicted acoustic duration (ms)

Aco

ustic

dur

atio

n (m

s)

Figure 10: Predicting acoustic duration with combined overlap and gesturalcompression

19

How does this relate to Ewe? As far as the profile of gestures goes (Hy-pothesis 1), English does not seem to differ from Ewe. This result is similarto a finding by Browman and Goldstein (1992), in which they compared thelabial gestures of nasal+stop clusters in English with the prenasalized stopsof Chaga. In both acoustic and articulatory measurements, the sequences inEnglish were virtually indistinguishable from complex phonemes of Chaga(with the one surprising exception that Chaga /m>p/ actually appears tobe more bipartite than any other English or Chaga stop or stop sequence).Both Browman and Goldstein’s findings, and the current findings, are con-sistent with the general hypothesis of gestural economy: larger linguisticunits are built from more basic articulatory plans.

Unfortunately, for the remaining hypotheses concerning variability acrossdifferent speech rates, Maddieson does not report any relevant data. Notethat if Ewe /

>

kp/ does act like English /k/+/p/, then it might be possibleto test this using just Maddieson’s data from a single speech rate, exploitingwhatever variation in acoustic duration he had within his single, uncon-trolled speech rate. If his data revealed no significant variability in overlap,however, it would be necessary to collect more Ewe data, including labiove-lars at a variety of speech rates. If Byrd is correct about increased timingstability being a feature of complex segments, then we should find little dif-ference between fast and slow pronunciations of [

>

kp] in Ewe. If, on the otherhand, Ewe turns out to look more like English, then we would be forcedto conclude that segmenthood is an essentially distributional fact which isneither learned from phonetic cues nor reflected in it.

20

References

Browman, C. and L. Goldstein (1992). Articulatory phonology: anoverview. Phonetica 49, 155–180.

Browman, C. P. and L. Goldstein (1986). Towards an articulatory phonol-ogy. Phonology Yearbook 3, 219–252.

Byrd, D. (1994). Articulatory Timing in English Consonant Sequences.Ph. D. thesis, UCLA.

Cho, T. (1999). The lexical specification of intergestural timing and ges-tural overlap. UCLA ms.

Guenther, F. (1995). A modeling framework for speech motor developmentand kinematic articulator control. Proceedings of the XIIIth Interna-tional Congress of Phonetic Sciences 2, 92–99.

Hoole, P., K. Munhall, and C. Mooshammer (1998). Do air-stream mech-anisms influence tongue movement paths? (xxx probably publishedby now).

Ladefoged, P. (1968). A Phonetic Study of West African Languages (2nded.). Cambridge: Cambridge University Press.

Maddieson, I. (1993). Investigating Ewe articulations with electromag-netic articulography. UCLA Working Papers in Phonetics 94, 22–53.

Maddieson, I. and P. Ladefoged (1989). Multiply-articulated segmentsand the feature hierarchy. UCLA Working Papers in Phonetics 72,116–138.

Smith, C. (1993). Prosodic patterns in the coordination of vowel andconsonant gestures. Papers in Laboratory Phonology IV, 205–222.

21