A study of syllable timing - Royal Institute of Technology · STL-QPSR 1/1971 B. A STUDY OF...
Transcript of A study of syllable timing - Royal Institute of Technology · STL-QPSR 1/1971 B. A STUDY OF...
Dept. for Speech, Music and Hearing
Quarterly Progress andStatus Report
A study of syllable timingRapp-Holmgren, K.
journal: STL-QPSRvolume: 12number: 1year: 1971pages: 014-019
http://www.speech.kth.se/qpsr
STL-QPSR 1/1971
B. A STUDY O F SYLLABLE TIMING46
K. Rapp
1. Introduction
When scanning a poem we note that language has rhythm (rhythm=pattern
of a temporal sequence (Allen, 1968)). The pattern of the temporal sequence
i s built up by metr ic feel, consisting of s tressed and unstressed beats ac-
cording to the rules of metr ics . Words may easily be selected to fit this pat-
tern. Words which match a metr ic foot usually contain the same number of
syllables and the position of thc s tressed syllable i s the same a s that of the
s tressed bcait.. -This i s not true of all languages but a t leas t for English, 1 I
Swedish, Dutch, and several others it is.
What "events" of speech a r e connected with the s t ressed beats o r down-
beats of a rhythm and where exactly a r e these beats located in speech?
Allen (1970) made some experiments to find the answers to these questions.
He asked subjects to tap their fingers o r match an auditive pulse to given
syllables of an utterance. The results of these experiments point a t the r e -
lease of a consonant o r the onset of a following vowel a s the place of the
beat location. However, subjects varied a s to the preferred place of beat ,
location. He also found that beats preceded the vowel onset by an amount I
positively correlated with the length of the prevocalic consonant. In another
experiment he asked subjects to produce speech synchronously with equi-
distant auditive pulses. Subjects in this experiment also varied a s to the
absolute location of speech in relation to pulses. However, they showed the
same relations with respect to how pulse locations differed a s a function of
the phonetic composition of the word. All subjects placed the beats about
30-70 msec ear l ier in the words with voiceless consonants before the vowel
than in words with voiced consonants. This difference has also been found
in an experiment where a Swedish subject repeated nonsense words syn-
chronously with equidistant auditive pulses. The words were La' t a:, a ' s a:, ada:, a' na:, and a' la:] with 100 items of each word. The resul ts show that
in words with voiceless consonants the pulse occurs on the average 20 ms ec I
la ter than the vowel onset and for voiced consonants about 50 msec after the i vowel onset (Lindblom, 197 0).
The present study i s an extended version of the work referred to above.
It i s intended to investigate beat location and i ts relation to various aspects
of timing of speech segments.
* Thesis work associated with the Dept. of Phonetics, Stockholm University
STL-QPSR 1/1971
2. Experiment
2. 1 S ~ e e c h %aterial --- _ - - --- For the purpose of investigating temporal relations within Swedish syl-
lable s disyllabic nonsense words with a i m a t e s t ress were constructed. The
words were:
3 a ' C a:d where C: = [s], [t], [d l , (11, [ n l [st], and [str] 1
Nonsense words were preferred to meaningful items since the irrelevant
sources of variation can be controlled. Thus the vowels and the last con-
sonant a r e the same in all words. The place of articulation i s also controlled,
all the intervocalic consonants having a dental articulation. These words
meet all the requirements of Swedish phonotactic and suprasegmental rules.
The words were read 100 times each. For seven test words this makes
700 items which were ordered randomly and organized in groups of five words.
More than two identical words in a row were not allowed. The speech mate-
rial was presented on 12 lists.
2 . 2 Eqer imen ta l subiects I ---- ------- -- Three Swedish male subjects recorded these l i s ts of words. They had all
served a s subjects for other tests several times before and were used to the
equipment and the situation. None of them had any speech disorders or
marked accent.
2 . 3 Experimental procedures ---- ----- ----- The equipment used for the experiment i s shown schematically in
Fig. I-B-I. The recordings were made under high quality conditions in an
anechoic chamber at KTH. The subject had earphones placed just behind
his ea r s to make it possible for him to hear his own speech a s well a s mes-
sages from outside the room. Equidistant auditive pulses were presented to
him through the earphones at a rate of about 2 pulses/sec, which was the I
same for all subjects. A microphone a t a distance of about 3 dm was used.
The subject was instructed to read the words in synchrony with every second
pulse, The subject read the test materials in groups of five words. Pausing
after each list. The pulse signal and the speech signal were recorded on
separate channels. The task was considered an easy one. One of the sub-
jects (no. 2) complained that the rate of pulses was a bit too fast.
STL-QPSR 1/197 1
3. Results
F o r segmentation purposes the recorded mater ia l was processed with a
mingograph. The four curves used were: (1) speech signal oscillogram,
(2) pulse curve, (3 ) duplex oscillogram, and (4) fundamental frequency curve.
The paper speed used was 100 mm/sec.
Measurements were made of acoustic segment durations, location of
pulses in relation to speech segments, and Fo contours.
3. 1 Segment durations ---- - ------ Measurements were made f rom the duplex oscillogram according to
Table I-B-I . ( F o r an example of the segmentation procedures, see Fig.
I-B-2. )
a' s a:d
a' t a:d
a 'da :d
a ' la :d
a ' na :d
a' sta:d
a ' s t r a :d
I I friction -- i occlusion 1 aspi ra t ion i
a I occlusion
occlusion
occlusion
friction 1 occlusion /! aspiration
aspiration
I friction I occlusion 11 '' r !
--
Table I-B- I. Segments recognized in the tes t words. Reference point indicated by two vert ical lines.
The measurements were made with a mm-graded ru ler with an accuracy ,
of f 5 msec ( 1 m m = 10 msec) . 40 i tems of each tes t word from each sub-
ject were measured and mean of segment durations were calculated. The i reason for not measuring all 100 i tems was that the durations of individual
i tems varied very little and another 60 i tems would not have changed the
mean appreciably.
3 . 2 Pulse locations in relation to segment durations __------------------ - - - - - - _ _ The pulses were expected to occur close to the beginning of the s t r e s sed
vowel. Thus a reference point was chosen which was close to the beginning
of the vowel (see Table I-B- I). F r o m the reference point a vertical line
was marked on the pulse curve and the duration between the beginning of the
3 rnsec
0 160 260 300 400 500 600 msec
a s txCl. a: d
Fig. I-B-4. a Distribution of t ime-markers in relation to the acoustic segments of the words [a' sa:d], [a' ta :dj , [a' sta:d], and [a' s t r a:d] as produced by subject no. 2.
STL-QPSR 1/1971
see that pulses a r e approximately normally distributed around the mean.
This might indicate that some point in time i s aimed a t i. e. considered to
be "the" point of beat location but that for some reason this temporal target
i s reached with some imprecision. The large difference between speakers
shows that this point i s not necessarily the same in absolute terms.
The bottom part of Fig. I-B-6 shows the mean durations of acoustic seg-
ments for three speakers in relation to the means of the pulse distributions.
We can see that there a r e large variations of duration in the intervocalic
consonants and consonant clusters. Single consonants a r e shorter when
voiced. Consonant clusters a r e a bit longer than the voiceless single con-
sonants. The consonants of the cluster a r e not of the same length a s in iso-
lation but a r e shortened a s a function of the number of consonants in the
cluster. In words with a long consonantal portion the pulse is located earl ier
in relation to the stressed vowel than in words with short consonantal por-
tions a s can be seen from Fig. I-B-7.
Consonants and consonant clusters vary between 100-225 msec. The total
length of the words varies between 625-670 msec. Since the latter variation
i s smaller we can infer that a compensation has taken place in the other seg-
ments. The compensation shows up a s a slightly negative correlation between
(see, Lehiste, 1970) intervocalic consonants and the other segments of the
word (Fig. I-B-8). 1
4.2 F contours ---s ---,-
Results of thc Fo measurements a r e shown in Fig. I-B-9. The upper
par t of this figure shows Fo contours for 7 words and subject no. 2. The
Fo contours have been aligned with respect to the point where they all run
through 125 Hz. All curves fall inbetween the two extreme curves. All
words with voiced consonants have a lower initial Fo than the words with un-
voiced consonants. The lower part shows how the acoustic segments of the
words a r e organized when the point of 125 Hz i s the basis of comparison.
During the stressed vowels the course of Fo i s evidently more o r less iden-
tical for all words. The onset of the vowel rather than the beginning of the
word seems to determine the timing of the Fo contour and of the final [dl.
5. Chain model and preplanning model
There a r e two models of speech timing to which these data will be related.
One is the chain model, an extreme version of which presupposes that dura-
tionally a sequence of speech i s simply the sum of its constituent parts.
DURATION OF [a:], [a] , OR [ d l (msec)
Fig. I-B-8. Duration of consonants o r consonant c lu s t e r s expressed as a function of the duration of [ a : ] , [a], and [dl.
STL-QPSR 1/1971 19.
To exemplify the duration of [s t ] would be expected to be c v a l to the sum
of the inherent durations of [ s ] and [ t l .
The other model i s the preplanning model. According to one interpreta-
tion of this model a given amount of t ime i s assigned to a cer tain unit of
speech (e. g. syllable) and the durations of i t s constituent p a r t s a r e adjusted
so a s to make i t s duration constant.
Fig. I-B-8 provides evidence that can be interpreted in this direction.
The durational variation of the tes t consonants i s c lear ly compensated for ,
albeit only to a slight extent.
On the other hand, Fig. I-B-9 demonstrates that the Fo contours a r e
fair ly we13 synchronized during the [ a : ] vowels in spite of the variations in
consonant duration. This seems to lend support to the chain model. Con- I
sequently a co r rec t model of speech timing is likely to incorporate features
of both principles. I
References:
A1 len, G. (1968): "The place of rhythm in a theory of language", VIP, UCLA, NO. 10, pp. 60-64.
I
Allen, G. (1970): "The location of rhyt lmic s t r e s s beats in English: An experimental study", TIP, UCLA, No. 14, pp. 80-132.
Eggermont, J. (1969): "Location of the syllable beat in routine scansion recitations of a Dutch poem", IPO Annual P r o g r e s s Rep. , No. 4, pp. 60-69.
Lehiste, I. (197 1): "Temporal organization of spoken language", in F o r m and Substance (Copenhagen), pp. 159- 169.
Lindblom, B. (1970): l 'Temporal organization of syllabic processes" , invited paper presented a t the ASA-meeting a t At1 antic City, April 1970.
Kozhevnikov, V.A. and Chistovich, L.A. (1965): Speech: Articulation and Perception (transl. f rom Russian, US Dept. of Commerce, Washington, D. C. ).
Ohman, S. (1965): "On the coordination of ar t iculatory and phonatory activity in the production of Swedish tonal ~. .ccents", STL-QPSR 2/1965, pp. 14-19.