A study of syllable timing - Royal Institute of Technology · STL-QPSR 1/1971 B. A STUDY OF...

Dept. for Speech, Music and Hearing

Quarterly Progress andStatus Report

A study of syllable timingRapp-Holmgren, K.

journal: STL-QPSRvolume: 12number: 1year: 1971pages: 014-019

http://www.speech.kth.se/qpsr

http://www.speech.kth.se

http://www.speech.kth.se/qpsr

STL-QPSR 1/1971

B. A STUDY O F SYLLABLE TIMING46

K. Rapp

1. Introduction

When scanning a poem we note that language has rhythm (rhythm=pattern

of a temporal sequence (Allen, 1968)). The pattern of the temporal sequence

i s built up by metr ic feel, consisting of s tressed and unstressed beats ac-

cording to the rules of metr ics . Words may easily be selected to fit this pat-

tern. Words which match a metr ic foot usually contain the same number of

syllables and the position of thc s tressed syllable i s the same a s that of the

s tressed bcait.. -This i s not true of all languages but a t leas t for English, 1 I

Swedish, Dutch, and several others it is.

What "events" of speech a r e connected with the s t ressed beats o r down-

beats of a rhythm and where exactly a r e these beats located in speech?

Allen (1970) made some experiments to find the answers to these questions.

He asked subjects to tap their fingers o r match an auditive pulse to given

syllables of an utterance. The results of these experiments point a t the r e -

lease of a consonant o r the onset of a following vowel a s the place of the

beat location. However, subjects varied a s to the preferred place of beat ,

location. He also found that beats preceded the vowel onset by an amount I

positively correlated with the length of the prevocalic consonant. In another

experiment he asked subjects to produce speech synchronously with equi-

distant auditive pulses. Subjects in this experiment also varied a s to the

absolute location of speech in relation to pulses. However, they showed the

same relations with respect to how pulse locations differed a s a function of

the phonetic composition of the word. All subjects placed the beats about

30-70 msec ear l ier in the words with voiceless consonants before the vowel

than in words with voiced consonants. This difference has also been found

in an experiment where a Swedish subject repeated nonsense words syn-

chronously with equidistant auditive pulses. The words were La' t a:, a ' s a:, ada:, a' na:, and a' la:] with 100 items of each word. The resul ts show that

in words with voiceless consonants the pulse occurs on the average 20 ms ec I

la ter than the vowel onset and for voiced consonants about 50 msec after the i vowel onset (Lindblom, 197 0).

The present study i s an extended version of the work referred to above.

It i s intended to investigate beat location and i ts relation to various aspects

of timing of speech segments.

* Thesis work associated with the Dept. of Phonetics, Stockholm University

STL-QPSR 1/1971

2. Experiment

2. 1 S ~ e e c h %aterial --- _ - - --- For the purpose of investigating temporal relations within Swedish syl-

lable s disyllabic nonsense words with a i m a t e s t ress were constructed. The

words were:

3 a ' C a:d where C: = [s], [t], [d l , (11, [ n l [st], and [str] 1

Nonsense words were preferred to meaningful items since the irrelevant

sources of variation can be controlled. Thus the vowels and the last con-

sonant a r e the same in all words. The place of articulation i s also controlled,

all the intervocalic consonants having a dental articulation. These words

meet all the requirements of Swedish phonotactic and suprasegmental rules.

The words were read 100 times each. For seven test words this makes

700 items which were ordered randomly and organized in groups of five words.

More than two identical words in a row were not allowed. The speech mate-

rial was presented on 12 lists.

2 . 2 Eqer imen ta l subiects I ---- ------- -- Three Swedish male subjects recorded these l i s ts of words. They had all

served a s subjects for other tests several times before and were used to the

equipment and the situation. None of them had any speech disorders or

marked accent.

2 . 3 Experimental procedures ---- ----- ----- The equipment used for the experiment i s shown schematically in

Fig. I-B-I. The recordings were made under high quality conditions in an

anechoic chamber at KTH. The subject had earphones placed just behind

his ea r s to make it possible for him to hear his own speech a s well a s mes-

sages from outside the room. Equidistant auditive pulses were presented to

him through the earphones at a rate of about 2 pulses/sec, which was the I

same for all subjects. A microphone a t a distance of about 3 dm was used.

The subject was instructed to read the words in synchrony with every second

pulse, The subject read the test materials in groups of five words. Pausing

after each list. The pulse signal and the speech signal were recorded on

separate channels. The task was considered an easy one. One of the sub-

jects (no. 2) complained that the rate of pulses was a bit too fast.

PULSE GENERATOR

Fig . I -B - I. Equipment for the experiment.

STL-QPSR 1/197 1

3. Results

F o r segmentation purposes the recorded mater ia l was processed with a

mingograph. The four curves used were: (1) speech signal oscillogram,

(2) pulse curve, (3 ) duplex oscillogram, and (4) fundamental frequency curve.

The paper speed used was 100 mm/sec.

Measurements were made of acoustic segment durations, location of

pulses in relation to speech segments, and Fo contours.

3. 1 Segment durations ---- - ------ Measurements were made f rom the duplex oscillogram according to

Table I-B-I . ( F o r an example of the segmentation procedures, see Fig.

I-B-2. )

a' s a:d

a' t a:d

a 'da :d

a ' la :d

a ' na :d

a' sta:d

a ' s t r a :d

I I friction -- i occlusion 1 aspi ra t ion i

a I occlusion

occlusion

occlusion

friction 1 occlusion /! aspiration

aspiration

I friction I occlusion 11 '' r !

--

Table I-B- I. Segments recognized in the tes t words. Reference point indicated by two vert ical lines.

The measurements were made with a mm-graded ru ler with an accuracy ,

of f 5 msec ( 1 m m = 10 msec) . 40 i tems of each tes t word from each sub-

ject were measured and mean of segment durations were calculated. The i reason for not measuring all 100 i tems was that the durations of individual

i tems varied very little and another 60 i tems would not have changed the

mean appreciably.

3 . 2 Pulse locations in relation to segment durations __------------------ - - - - - - _ _ The pulses were expected to occur close to the beginning of the s t r e s sed

vowel. Thus a reference point was chosen which was close to the beginning

of the vowel (see Table I-B- I). F r o m the reference point a vertical line

was marked on the pulse curve and the duration between the beginning of the

Fig. I-B-2. Example of segmentation of one item of the word.

$3 (D

: 5 urn b.

NUMBER OF OBSERVATIONS

3 rnsec

0 160 260 300 400 500 600 msec

a s txCl. a: d

Fig. I-B-4. a Distribution of t ime-markers in relation to the acoustic segments of the words [a' sa:d], [a' ta :dj , [a' sta:d], and [a' s t r a:d] as produced by subject no. 2.

NUMBER OF OBSERVATIONS

PERCENT OCCURRENCE

STL-QPSR 1/1971

see that pulses a r e approximately normally distributed around the mean.

This might indicate that some point in time i s aimed a t i. e. considered to

be "the" point of beat location but that for some reason this temporal target

i s reached with some imprecision. The large difference between speakers

shows that this point i s not necessarily the same in absolute terms.

The bottom part of Fig. I-B-6 shows the mean durations of acoustic seg-

ments for three speakers in relation to the means of the pulse distributions.

We can see that there a r e large variations of duration in the intervocalic

consonants and consonant clusters. Single consonants a r e shorter when

voiced. Consonant clusters a r e a bit longer than the voiceless single con-

sonants. The consonants of the cluster a r e not of the same length a s in iso-

lation but a r e shortened a s a function of the number of consonants in the

cluster. In words with a long consonantal portion the pulse is located earl ier

in relation to the stressed vowel than in words with short consonantal por-

tions a s can be seen from Fig. I-B-7.

Consonants and consonant clusters vary between 100-225 msec. The total

length of the words varies between 625-670 msec. Since the latter variation

i s smaller we can infer that a compensation has taken place in the other seg-

ments. The compensation shows up a s a slightly negative correlation between

(see, Lehiste, 1970) intervocalic consonants and the other segments of the

word (Fig. I-B-8). 1

4.2 F contours ---s ---,-

Results of thc Fo measurements a r e shown in Fig. I-B-9. The upper

par t of this figure shows Fo contours for 7 words and subject no. 2. The

Fo contours have been aligned with respect to the point where they all run

through 125 Hz. All curves fall inbetween the two extreme curves. All

words with voiced consonants have a lower initial Fo than the words with un-

voiced consonants. The lower part shows how the acoustic segments of the

words a r e organized when the point of 125 Hz i s the basis of comparison.

During the stressed vowels the course of Fo i s evidently more o r less iden-

tical for all words. The onset of the vowel rather than the beginning of the

word seems to determine the timing of the Fo contour and of the final [dl.

5. Chain model and preplanning model

There a r e two models of speech timing to which these data will be related.

One is the chain model, an extreme version of which presupposes that dura-

tionally a sequence of speech i s simply the sum of its constituent parts.

DURATION OF [a:], [a] , OR [ d l (msec)

Fig. I-B-8. Duration of consonants o r consonant c lu s t e r s expressed as a function of the duration of [ a : ] , [a], and [dl.

FUNDAMENTAL FREQUENCY

STL-QPSR 1/1971 19.

To exemplify the duration of [s t ] would be expected to be c v a l to the sum

of the inherent durations of [ s ] and [ t l .

The other model i s the preplanning model. According to one interpreta-

tion of this model a given amount of t ime i s assigned to a cer tain unit of

speech (e. g. syllable) and the durations of i t s constituent p a r t s a r e adjusted

so a s to make i t s duration constant.

Fig. I-B-8 provides evidence that can be interpreted in this direction.

The durational variation of the tes t consonants i s c lear ly compensated for ,

albeit only to a slight extent.

On the other hand, Fig. I-B-9 demonstrates that the Fo contours a r e

fair ly we13 synchronized during the [ a : ] vowels in spite of the variations in

consonant duration. This seems to lend support to the chain model. Con- I

sequently a co r rec t model of speech timing is likely to incorporate features

of both principles. I

References:

A1 len, G. (1968): "The place of rhythm in a theory of language", VIP, UCLA, NO. 10, pp. 60-64.

I

Allen, G. (1970): "The location of rhyt lmic s t r e s s beats in English: An experimental study", TIP, UCLA, No. 14, pp. 80-132.

Eggermont, J. (1969): "Location of the syllable beat in routine scansion recitations of a Dutch poem", IPO Annual P r o g r e s s Rep. , No. 4, pp. 60-69.

Lehiste, I. (197 1): "Temporal organization of spoken language", in F o r m and Substance (Copenhagen), pp. 159- 169.

Lindblom, B. (1970): l 'Temporal organization of syllabic processes" , invited paper presented a t the ASA-meeting a t At1 antic City, April 1970.

Kozhevnikov, V.A. and Chistovich, L.A. (1965): Speech: Articulation and Perception (transl. f rom Russian, US Dept. of Commerce, Washington, D. C. ).

Ohman, S. (1965): "On the coordination of ar t iculatory and phonatory activity in the production of Swedish tonal ~. .ccents", STL-QPSR 2/1965, pp. 14-19.

A study of syllable timing - Royal Institute of Technology · STL-QPSR 1/1971 B. A STUDY OF...

Documents

Transcript of A study of syllable timing - Royal Institute of Technology · STL-QPSR 1/1971 B. A STUDY OF...