On the production of aspiration and...

90
Academic Year 2010 - 2011 On the production of aspiration and prevoicing The effect of training on native speakers of Belgian Dutch Janey Vanlocke Supervisor Dr. Ellen Simon Master thesis submitted in partial fulfilment of the requirements for the degree of Master in English-Italian Literature and Linguistics

Transcript of On the production of aspiration and...

Page 1: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

1

Academic Year 2010 - 2011

On the production of

aspiration and prevoicing

The effect of training on native speakers of Belgian Dutch

Janey Vanlocke

Supervisor Dr. Ellen Simon

Master thesis submitted in partial fulfilment of the requirements for the degree of Master in English-Italian Literature and Linguistics

Page 2: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

2

TABLE OF CONTENTS

Acknowledgements ........................................................................................................................................................................5

1. Introduction .................................................................................................................................................................................6

2. Aspiration and prevoicing .......................................................................................................................................................8

2.1 Introduction to both features .........................................................................................................................................8

2.2 Aspiration ..............................................................................................................................................................................9

2.2.1 What is aspiration? ....................................................................................................................................................9

2.2.2 Positive Voice Onset Time .................................................................................................................................... 10

2.2.3 The effect of place of articulation ...................................................................................................................... 11

2.2.4 English vs. Dutch ..................................................................................................................................................... 16

2.3 Prevoicing .......................................................................................................................................................................... 18

2.3.1 What is prevoicing? ................................................................................................................................................ 18

2.3.2 Negative Voice Onset Time .................................................................................................................................. 19

2.3.3 Influencing factors .................................................................................................................................................. 21

2.3.4 English vs. Dutch ..................................................................................................................................................... 23

3. The effect of training .............................................................................................................................................................. 25

3.1 Perception vs. production ............................................................................................................................................. 25

3.1.1 Perception ................................................................................................................................................................. 25

3.1.2 Production ................................................................................................................................................................. 26

3.1.3 Audiovisual training ............................................................................................................................................... 27

3.2 Real-time spectrograms ................................................................................................................................................ 28

3.3 Other techniques used in pronunciation training ................................................................................................ 29

3.3.1 Feedback .................................................................................................................................................................... 30

3.3.2 Contrasting with native language ...................................................................................................................... 31

4. Case study .................................................................................................................................................................................. 32

4.1 Hypotheses ........................................................................................................................................................................ 32

4.1.1 General aim of the case study ............................................................................................................................. 32

4.1.2 Specific hypotheses ................................................................................................................................................ 33

Page 3: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

3

4.2 Method ................................................................................................................................................................................ 33

4.2.1 Participants ............................................................................................................................................................... 33

4.2.2 Stimuli and design .................................................................................................................................................. 35

4.2.3 Procedure .................................................................................................................................................................. 38

4.2.4 Analysis ...................................................................................................................................................................... 39

4.3 Results and discussion ................................................................................................................................................... 39

4.3.1 Analysis ...................................................................................................................................................................... 39

4.3.2 Pretest ......................................................................................................................................................................... 40

4.3.3 Posttest ....................................................................................................................................................................... 46

4.3.4 Pretest vs. posttest: a comparison..................................................................................................................... 57

4.3.5 Summary .................................................................................................................................................................... 62

5. Conclusion ................................................................................................................................................................................. 63

References ...................................................................................................................................................................................... 64

Appendices ..................................................................................................................................................................................... 66

Appendix A: Questionnaire.................................................................................................................................................. 66

Appendix B: Pretest and Posttest ...................................................................................................................................... 69

1. List of tokens used in pre- and posttest picture-naming task ........................................................................ 69

2. Slides as presented in pre- and posttest picture-naming task ....................................................................... 71

3. List of tokens used in posttest word-reading task ............................................................................................. 76

4. List of words as presented in posttest word-reading task .............................................................................. 77

Appendix C: Training session ............................................................................................................................................. 78

1. Slides used in training session on aspiration and prevoicing ........................................................................ 78

2. Handout with tips on aspiration and prevoicing ................................................................................................ 81

Appendix D: Copy of recordings ........................................................................................................................................ 82

Appendix E: Results of pretest ........................................................................................................................................... 83

1. Aspiration ........................................................................................................................................................................ 83

2. Prevoicing ........................................................................................................................................................................ 84

Page 4: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

4

Appendix F: Results of Posttest.......................................................................................................................................... 85

1. Picture-naming task ..................................................................................................................................................... 85

2. Word-reading task ........................................................................................................................................................ 87

Appendix G: results of pre- and posttest picture-naming task ................................................................................ 89

Page 5: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

5

ACKNOWLEDGEMENTS

First and foremost, I would like to thank my promotor Dr. Ellen Simon, who guided me through

the writing process and provided me with indispensable advice. I would also like to thank her

for her professional feedback, for taking the time to meet up with me to discuss my progress (or

lack thereof).

Thank you to all of the patient and cooperative volunteers who worked with me on this

project with much enthusiasm and for taking the time to take part in it. Truly, without them this

research paper would not have been possible.

Since completing this Master thesis is the final step before graduating, I would also like to take

this opportunity to thank some very important people who have supported me through my

studies from day one.

I am extremely grateful to my parents for believing in me and for never doubting I would

get there in the end, even when I myself did not. I hope I have made them proud.

A huge thank you to my lovely boyfriend who never stopped supporting me and always

pushed me to do the best I could. For trusting I would succeed in the end.

My special thanks go to my grandmother, who I remember taught me how to achieve the

correct English pronunciation, even when I was very little.

Page 6: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

6

1. INTRODUCTION

The topic of the present paper is the production of aspiration of voiceless plosives /p, t, k/ and

prevoicing of voiced plosives /b, d/ by Belgian Dutch speakers of English. More specifically, the

current paper is on the effect specific pronunciation training of both of these features has on

non-native speakers of English. In other words, whether or not a single training session can

contribute to a learner’s knowledge and practical application of pronunciation of a foreign

language.

In learning a non-native language, speakers are known to transfer from their native

language (henceforth, L1), i.e. to apply certain features of their L1 in the foreign language,

transfer occurs not only on a grammatical level, but also on a phonetic level (e.g. Collins &

Vandenbergen, 2000). These features, however, might not be in use (in the same way) as in the

L1, hence transfer often leads to mistakes or misunderstandings in the foreign language. Under

the influence of their native language, L2 learners often produce certain pronunciation features

in English the same way they would in their native language. For example, speakers of Dutch

frequently substitute English /e/ for /æ/ as in for example <send for help> which then turns

into */’sænd fƏ ‘hælp/. In other words, native speakers of Dutch mispronounce features of

English because they extrapolate features from their L1 onto the L2.

The learning of a foreign language has also been widely discussed with regards to the so-

called critical age. This has been suggested for example by Flege (1989; mentioned in Hattori,

2009), who draws attention to the fact that training phonetic contrasts becomes very difficult

when these were not taught and maintained at an early age. However, Flege (1995; mentioned in

Hattori, 2009) also suggests, that even in adulthood, speakers have held onto the ability to learn

a language. Furthermore, pronunciation training or else learning new information on a language,

has been known to be effective even at a later stage in life, i.e. past the critical age (e.g. Hattori,

2009).

The first part of the present paper provides background information on the processes of

aspiration and prevoicing and on training methods. The first section explores the process of

aspiration of the voiceless plosives /p, t, k/. This is a prominent feature in English but is not in

Dutch. The differences in realization of voiceless plosives between the two languages, i.e. English

and Belgian Dutch, are pointed out and illustrated by means of stills of spectrograms and

waveforms. In the second section, prevoicing is discussed. In Dutch, initial voiced plosives are

produced with prevoicing. This feature is however absent in English. The final section of the first

part provides a brief overview of several different methods or techniques of pronunciation

training and on their effectiveness which have been widely researched.

Page 7: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

7

In the second part, a case study is presented in which an experiment was conducted with

13 Belgian Dutch speakers. The experiment consisted of two parts. First, the informants were

asked to name pictures in English. These words tested them on their production of the two

features at hand, i.e. aspiration and prevoicing. After the pretest they took part in a short

training session in which the aforementioned features were explained on a theoretical basis, but

they were also practiced on. The second part of the experiment, i.e. the posttest, consisted of the

identical picture-naming task as in the pretest and a word-reading task.

The object of the present study is threefold: (1) find out if native speakers of Belgian Dutch

produce aspiration in their production of English, (2) ascertain whether or not native speakers

of Belgian Dutch transfer the production of prevoicing onto English and (3) establish if a single

training session has an effect on the production of both features. In other words, if after training,

the informants produce aspiration and/or omit the production of prevoicing when uttering

English words. A minor additional aim of the experiment was to establish, if the effect of the

pronunciation training can be noticed only in the words which were specifically trained on

during the session or if the information was generalized by the participants to other so-called

new words.

It was hypothesized that before training, the speakers will not produce aspiration and will

prevoice heavily. After having completed the pronunciation training, the informants are

expected to improve their pronunciation towards a more target-like one (i.e. with aspiration and

without prevoicing), especially in the specific words which were trained on during the one-on-

one instruction.

The results showed that a single short pronunciation training session can influence the

production of both aspiration and prevoicing. Most of the participants showed considerable

improvement of both processes. Overall, an improvement of 18,1% for aspiration and 23,9% for

prevoicing was established through only one training session.

Page 8: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

8

2. ASPIRATION AND PREVOICING

2.1 INTRODUCTION TO BOTH FEATURES

Van Alphen & Smits (2004) explain that in most languages, the types of plosives are divided into

two phonemic categories, i.e. voiced /b, d, g/ and voiceless /p, t, k/1. They point out that the

phonetic realization of these aforementioned phonemic classes is a varying factor among

languages. Since this study is on the pronunciation of English by native speakers of Belgian

Dutch, at the end of both subchapters, a concrete comparison will be made between the two

languages to point out differences and/or similarities (see Sections 2.2.4 and 2.3.4).

In a study by Lisker and Abramson (1964), the notion of Voice Onset Time (henceforth,

VOT) is put forward as an important player in the production of stop consonants. VOT is the

elapsed time between the release of the stop and the onset of vocal fold vibration. After having

studied 11 languages, Lisker & Abramson (1964) were able to conclude that VOT distinguishes

between three categories of plosives: plosives with a negative VOT, with a slightly positive VOT

and with a clearly positive VOT. The first category – termed fully voiced – which gives rise to a

negative VOT, is the result of the production of voicing during the closure. This process is also

called prevoicing. In other words, the vocal folds have started vibrating before the release of the

initial plosive. Since the release of the plosive counts as the starting point for VOT measurement,

the VOT recorded in voiced plosives is negative in the case of prevoicing. The process of

prevoicing will be discussed more elaborately in Section 2.3. The plosives which are produced

with little or no aspiration and thus show only a slightly positive VOT make up the second

category, also labelled as voiceless unaspirated2. The third category involves those plosives

which lead to the production of a clearly positive VOT as a result of aspiration, i.e. the process by

which a weak ‘h’-like sound follows the release of the plosive. This last category is otherwise

known as voiceless aspirated. Since the onset of voicing in delayed by the production of

aspiration – which is voiceless – the period of voicelessness is longer, i.e. the VOT will be longer

than when no aspiration can be detected. A more detailed account of this process will be given in

Section 2.2.

The existence of these three categories of voicing implies that any language could make use

of them. This matter has been studied and has led to the conclusion that most languages in fact

do not employ all three. Thai, for example, uses the three different categories as van Alphen &

1 Voiced and voiceless plosives are also referred to as lenis and fortis plosives, respectively. 2 Keating, Linker & Huffman (1983; mentioned in van Alphen & Smits, 2004) showed that this category is the most common one of the three. Of the 51 languages they studied, almost all of them used the voiceless unaspirated category. For a more detailed report on the study see Keating, P. A., Linker, W. and Huffman, M. 1983. Patterns in allophone distribution for voiced and voiceless stops. Journal of Phonetics. 11. 277-290. (quoted in van Alphen & Smits, 2004).

Page 9: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

9

Smits (2004) state, but it is only one of the very few languages that does so. A two-way

distinction is the most common among the languages in the world.

2.2 ASPIRATION

As explained in Section 2.1, the phonetic realization of the fortis plosives /p, t, k/ differs among

languages. In analogy to what is shown in the introduction (Section 2.1), namely that the

languages of the world can be categorized according to their phonetic realizations of plosives

(i.e. fully voiced, voiceless unaspirated and voiceless aspirated), Collins and Mees (2008) state

that languages are divided into aspirating and non-aspirating languages.

Aspiration is a feature present in most English accents. This situation is different in Dutch,

in which fortis plosives are not aspirated. In other words, English and Dutch are characterized

by a different phonetic realization of /p, t, k/. The phenomenon also does not occur in the

Romance languages (e.g. Italian, French, and Spanish).

2.2.1 WHAT IS ASPIRATION?

Collins & Mees (2008) explain that the process of aspiration (as touched upon in Section 2.1) is

often referred to as a small puff of air, which occurs after the release of voiceless stop

consonants /p, t, k/. In phonetics, it is symbolized as [h].

Aspiration is strongest in word-initial stressed position, e.g. in the English word <pie>,

phonetically [phai] (e.g. Collins & Mees, 2008; Collins & Vandenbergen, 2000). It is however less

strong in unstressed syllables as for example in <competitor>, phonetically [kəm’phetitə], in

which aspiration is much more evident in the stressed stop consonant /p/ than in the

unstressed initial /k/ or in both /t/s (e.g. Collins & Mees, 2008; Collins & Vandenbergen, 2000).

It is furthermore also noteworthy that when a stop is or when stop clusters are preceeded by the

fricative /s/, no aspiration is produced, for example in <spoon> (e.g. Collins & Mees, 2008;

Collins & Vandenbergen, 2000).

Aspiration manifests itself as a period of voicelessness, which is essentially a delay in the

onset of voicing of the following vowel. This period of voicelessness is expressed in VOT.

Page 10: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

10

2.2.2 POSITIVE VOICE ONSET TIME

VOT, which is measured in milliseconds (henceforth, ms), is the time that elapses between the

release of a stop consonant and the onset of voicing. When aspiration occurs, the onset of

vibration of the vocal folds is delayed. In other words, the period of voicelessness is prolonged,

which in turn will render a longer VOT than when aspiration is absent.

Two examples are given below (Fig. 1 and Fig. 2), which show the difference in VOT

(marked in red) between when aspiration is produced and when it is not present. To maximize

the difference, one example is taken from English3 (Fig. 1) and the other from Dutch4 (Fig. 2).

Both tokens were produced by native speakers of, respectively, English and Dutch.

Figure 1 Waveform and spectrogram of the word <kus>

produced by a native speaker of Belgian Dutch. (Praat, Boersma & Weenink, 2011)

3 The sound file for this example was cut from the audio CD included in Collins & Vandenbergen (2000). 4 The audio file for this word was taken from my Bachelor Research paper (Vanlocke, 2010) in which native speakers of Belgian Dutch were asked to perform a reading task.

Page 11: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

11

Figure 2 Waveform and spectrogram of the word <kite> produced by a native speaker of English.

(Praat, Boersma & Weenink, 2011)

In Fig. 1, the waveform and spectrogram of the Dutch word <kus> (<kiss>) are shown, which

is produced without aspiration. As Fig. 1 shows, voicing starts almost immediately (i.e. 30,3 ms)

after the burst. Fig. 2 shows the waveform and spectrogram of the English word <kite> produced

with aspiration. Vocal fold vibration, associated with the production of the following vowel (here

/ai/), starts later (i.e. only after 71,9 ms) than when aspiration is not produced (as in Fig. 1).

Since English is an aspirating language, the VOT of the fortis plosive /k/ in word-initial stressed

position is much longer (71,9 ms) compared to that in the unaspirating language Dutch (30,3

ms).

2.2.3 THE EFFECT OF PLACE OF ARTICULATION

As mentioned before (Section 2.2.1), stress is a factor which can influence VOT (e.g. Collins &

Mees, 2008; Collins & Vandenbergen, 2000), i.e. VOT is longer in stressed syllables than in

unstressed syllables. Other factors that have been reported to have an effect on VOT are

speaking rate and place of articulation. With regards to speaking rate, researchers have

suggested it to be a feature which can influence VOT. Studies propose that, as speaking rate

decreases, VOT increases (e.g. Kessinger & Blumstein, 1998; Magloire & Green, 1999). The

second factor, i.e. place of articulation, will be discussed in greater detail.

According to Lisker and Abramson (1964), VOTs range according to the category of plosive

that is being produced. Through their data acquired from eleven different languages, they were

able to ascertain that there is a difference in VOT with regards to the place of articulation

Page 12: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

12

(henceforth, PoA) of the plosives. Their results revealed that VOTs are always longer5 in the

realization of velar stop than in that of alveolar or bilabial stops. A further distinction can be

made between the alveolar and bilabial stops, the former being longer than the latter. In other

words, VOTs of voiceless stop consonants relate to each other in the following manner: bilabial <

alveolar < velar, i.e. p < t < k.

In Figures 2 (see Section 2.2.2) to 4 (presented below), waveforms and spectrograms are

shown for each of the three possible plosives, indicating the difference in VOT duration. All

tokens are English words produced by native speakers6. For each of the tokens, the VOT is

marked in red. These randomly chosen examples provide proof for the theory of the influence of

PoA on VOT, since it is clear that the proposed rule of p < t < k is respected.

The shortest category, namely the bilabial plosive in the word <peas>, is presented in Fig. 3.

In this case, initial /p/ was produced with a VOT measured at 41,0 ms. Fig. 4 displays the

intermediate alveolar plosive /t/ represented here in the word <time>, with a VOT of 53,0 ms.

Comparing these two results with the VOT for /k/ in <kite> (see Fig. 2, above), it becomes clear

that the velar category renders the longest VOTs. Here it was measured at 71,9 ms, considerably

longer than either of the other two examples (71,9 ms > 53,0 ms > 41,0 ms).

Figure 3 Waveform and spectrogram of the word <peas> produced by a native speaker of English.

(Praat, Boersma & Weenink, 2011)

5 This is true whether or not the stops are aspirated. In other words, the difference will also be noticeable in Dutch. 6 All examples were taken from the audio CD included in Collins & Vandenbergen (2000).

Page 13: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

13

Figure 4 Waveform and spectrogram of the word <time> produced by a native speaker of English.

(Praat, Boersma & Weenink, 2011)

An explanation for this phenomenon was not provided by Lisker & Abramson (1964);

however, more recent studies have investigated this and verified it, within one language as well

as cross-linguistically7. Moreover, the processes that are being performed in the oral cavity while

speaking have now been studied in more detail. The difference in VOT for different places of

articulation is due to complex mechanisms8 used and performed by the speaker. Cho &

Ladefoged (1999) provide a brief summary of physiological and/or aerodynamic characteristics9

that have been suggested by the literature as reasons for the effect of PoA on VOT. These

characteristics are the following: (1) the volume of the cavity behind the point of constriction,

(2) the volume of the cavity in front of the point of constriction, (3) movement of articulators, (4)

extent of articulatory contact area, (5) change of glottal opening area (for voiceless aspirated

stops), and finally (6) temporal adjustment between closure duration and VOT (Cho &

Ladefoged, 1999).

Cho & Ladefoged (1999) furthermore provide an evaluation of each of these factors. Since

some explanations are more apt for unaspirated stops and others are more fit for aspirated

7 Cho & Ladefoged (1999) studied VOTs of eighteen languages and found evidence that in all but one of these, velar stops showed the longest VOTs. 8 Liu, Ng, Wan, Wang & Zhang (2007) mention an array of studies which account for the differences in VOT caused by the PoA. The topics at hand in order to pinpoint the reasons for the different VOTs associated with PoA enumerated by Liu et al. (2007) are: “physiological and aerodynamic characteristics of speech production including the laws of aerodynamics, velocity of the articulators movements, the extent of articulator contact, as well as the temporal adjustment between closure duration and VOT” (Liu et al., 2007). 9 For a more detailed account of these specific characteristics, see Cho & Ladefoged (1999).

Page 14: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

14

stops, they propose another explanation, which they claim holds for both aspirated and

unaspirated stops. Since this study is on aspiration, the explanation that was put forward by Cho

& Ladefoged (1999) as the most suitable for aspirated stop consonants, number (5) mentioned

above, will be discussed more elaborately. Without going into too much detail, Stevens10 (1999;

quoted in Cho & Ladefoged, 1999) attributes the varying VOTs in voiceless aspirated stops to

two main factors, namely the opening of the glottal area in various stages in accordance with the

different PoA, and the way the stiffness of the walls of the vocal tract and of the glottis changes

during the realization of the plosives.

Firstly, Stevens (1999; quoted in Cho & Ladefoged, 1999) gives an explanation for the

difference in VOTs associated with the PoA by discussing the changes occurring in the glottal

area. For the production of aspirated stops, it has been demonstrated that, before the release,

the glottis is already open. This opening is created in order to yield aspiration. By contrast, to

allow for voicing – which occurs after the release – the glottal area opening must be reduced.

Only when the glottis opening is decreased to approximately 0,12 cm2, the vocal folds will be

able to vibrate. The speed at which this decrease occurs is what Stevens (1999; quoted in Cho &

Ladefoged, 1999) proposes as the reason for the difference between the plosive categories. It is

suggested that the reduction in size of the glottis area happens faster for alveolar or labial stops

than for velars because the intraoral pressure present before the release drops more rapidly

with the production of the former than for the latter (see Fig. 5).

Figure 5 Schematized representation of the airflow and intraoral pressure in the release phase in voiceless stops (Stevens, 1999; as presented in Cho & Ladefoged, 1999).

10 It must be noted that Stevens (1999; quoted in Cho & Ladefoged, 1999) did not consider unaspirated stops in his description. Only aspirated stops were taken into account.

Page 15: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

15

Secondly, during the closure phase of the production of the plosive, the walls of the glottis

and the vocal tract tense up, presumptively as a compensation for the pressure in the oral cavity.

When the release takes place, the intraoral pressure obviously decreases, and this causes an

inward force to strike the glottal walls. As a consequence, the stiffness also diminishes.

Immediately following the release, however, the stiffness does not disappear completely. This

prolonging of the stiffness causes voicing to be delayed. Because the intraoral pressure reduces

more rapidly for bilabials and alveolars than for velar plosives, the walls of the glottis and the

vocal tract relax more rapidly, which creates the opportunity for the vocal folds to vibrate earlier

after the release.

In short, the release of bilabial and alveolar voiceless aspirated stops gives rise to a faster

drop of intraoral pressure, which in turn causes the decrease of the opening of the glottal area to

be faster and the relaxation of the stiffness to happen sooner. These factors combined induce the

onset of voicing to occur sooner for bilabial and alveolar stops than for velar stops.

Consequently, the period of voicelessness and hence the VOT is longer in velar than in bilabial or

alveolar stops.

Even though most of the research on the effect PoA on VOT reveals VOTs of p < t < k to

relate to each other in that way, still there have also been reports of different results which

account for a less drastic distinction between VOTs of the different PoA (Whalen, Levitt &

Goldstein, 2007; see Figure 6). Figure 6 shows that for English-speaking adults, most researchers

have found results that support the p < t < k relationship. There are however others who have

suggested p < t = k as a more correct distinction. Looking at English spoken by children from the

age of one up to seven, it can be noted that even p = t < k is put forward as an option. Then again,

when only the top half of the table is taken into consideration, i.e. VOTs in English, it can be

noted that most researchers agree on the relationship between voiceless plosives as p < t < k.

Page 16: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

16

Figure 6 A report on the effect of PoA on VOT in earlier works (as presented in Whalen, Levitt & Goldstein, 2007)

2.2.4 ENGLISH VS. DUTCH

Unlike English, Dutch belongs to the languages characterized by the production of unaspirated

rather than aspirated stops. As a consequence, the duration of VOTs is very different in these

two languages. On average, English VOTs can be measured around 76 ms (Simon, 2010). Their

Dutch equivalents however range between a mere 12 and 27 ms (Simon, 2010).

Previous studies have provided mean VOT measurements for both English and Dutch, which

serve as evidence for the proposed difference between the two languages. Lisker & Abramson’s

(1964) test results show that their English-speaking informants produced VOTs of 58, 70 and 80

ms for /p/, /t/ and /k/, respectively11. Docherty (1992) reports VOT values of 45,74 ms for /p/,

66,45 ms for /t/ and 66,09 ms for /k/. Hawkins (1979; quoted in Docherty, 1992) for example

found that voicing occurred after 47 ms in /p/, after 68 ms in /t/ and after 72 ms in /k/. The

study done by Suomi (1980; mentioned in Docherty, 1992) resulted in the following

measurements for /p, t, k/: 40, 55 and 56 ms. Finally, Simon (2010) found average VOTs of 80

ms in /p/, 73 ms in /t/ and 76 ms in /k/.

11 It should be pointed out that Lisker & Abramson (1964) tested with only four speakers. Hence, these results may not be entirely representative.

Page 17: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

17

All of these results can be found in Table 1. In order to be able to compare these results to

the ones obtained in the present study, averages of these previous results were also calculated

and provided in the table below (Table 1).

Table 1 Mean VOTs (in ms) for English aspirated stops measured in previous studies.

For their Dutch unaspirated counterparts, Lisker & Abramson (1964) also provided mean

VOTs12. They found that VOTs lasted 10 ms for /p/, 15 ms for /t/ and 25 ms for /k/. Simon

(2010) reports averages of 12 ms for /p/, 23 ms for /t/ and 29 ms for /k/. These results are

presented in Table 2, along with the average calculated from these measurements.

Table 2 Mean VOTs (in ms) for Dutch unaspirated stops

measured in previous studies.

In the table below (Table 3), a comparison can be found between the average English and

Dutch VOTs for the three voiceless stops /p, t, k/.

Table 3 Comparison between the averages for English and Dutch VOTs (in ms).

This table enhances the considerable difference between English and Dutch VOTs, as a

result of the presence and absence of aspiration. Moreover, the effect of PoA as discussed in

Section 2.2.3 is clearly visible, in both English13 and Dutch.

12 These results were obtained from a sole speaker of Dutch; therefore these results should be used tentatively. 13 Only Docherty (1992; mentioned in van Alphen & Smits, 2004) found /k/ as the plosive with the intermediate VOT, i.e. p < k < t, as can be seen in Table 1. However, /t/ is a mere 0,36 ms longer than /k/.

Page 18: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

18

2.3 PREVOICING

In some languages, as in Dutch, initial voiced plosives are produced with prevoicing (also see

Section 2.1). Other examples of languages belonging to this category are Arabic, Bulgarian,

French, Japanese, Polish, Russian and Spanish (examples provided by van Alphen & Smits, 2004).

These languages make a distinction between voiced and voiceless unaspirated plosives, which

means that the voiceless plosives /p, t, k/ are unaspirated and that the voiced plosives /b, d, g/

are characterized by the production of prevoicing. Dutch has only two types of voiced plosives in

its native phonology, namely /b/ and /d/. The voiced counterpart of velar /k/, namely /g/, is

only present in loanwords, such as <goal> (example taken from van Alphen & Smits, 2004). Just

as for voiceless plosives, English and Dutch have differing phonetic realizations of the voiced

plosives, which will be discussed in Section 2.3.4.

2.3.1 WHAT IS PREVOICING?

Van Alphen & Smits (2004) explain that prevoicing is produced during the phase in which the

closure of the initial plosive takes place. It is essentially the vibration of the vocal folds which

occurs before the realization of the initial voiced plosive. Van Alphen & Smits (2004) mention a

number of conditions that need to be met in order to create vibration of the vocal folds during

this process. They got the idea of these physiological and aerodynamic conditions from van den

Berg (1958; quoted in van Alphen & Smits, 2004). Two in particular are discussed.

The first of these conditions is that it must be made sure that the vocal folds are “adducted

and tensed” (van Alphen & Smits, 2004: 457). The second condition involves the transglottal

pressure, which must be adequately adjusted in order to render vocal fold vibration caused by

“enough positive airflow through the glottis” (van Alphen & Smits, 2004). When the articulators

are brought in the position for the production of a plosive, the exit way for the airflow is closed

off. The air that is passing through the glottis cannot leave the oral cavity which results in an

accumulation of intraoral pressure. Ohala (1983; quoted in van Alphen & Smits, 2004) states

that this way, the pressure built up in the oral cavity comes close to resembling subglottal

pressure. However, if the supraglottal area is expanded, this process will be slowed down. The

enlargement of the volume of the vocal tract makes voicing during the closure phase of the

plosive easier. This enlargement can be achieved in two ways: actively or passively14. The first

method – for which van Alphen & Smits (2004) refer back to studies done by Westbury (1983;

quoted in van Alphen & Smits, 2004) and Stevens (1998; quoted in van Alphen & Smits, 2004) –

14 It should be pointed out that van Alphen & Smits (2004) remark that in general it is believed that the processes of both active and passive expansion of the vocal tract volume are used in the production of prevoicing.

Page 19: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

19

involves increasing the size of the area above the glottis. This process involves a number of

mechanisms: (1) the lowering of the larynx, (2) the raising of the soft palate, (3) the advancing of

the tongue root, or alternatively, drawing down the dorsum and blade of the tongue. For the

second method, van Alphen & Smits (2004) state that “the supraglottal volume can also be

expanded passively due to the raised intraoral pressure, provided that the walls of the

supraglottal cavity are lax.”15 (van Alphen & Smits, 2004: 457).

2.3.2 NEGATIVE VOICE ONSET TIME

As touched upon in Section 2.1, prevoicing is characterized by negative VOT. This means that

there is voicing to be detected before the release of the voiced stop consonant. Since voicing

occurs before the burst of the plosive, VOT is negative.

In Fig. 7, the waveform and spectrogram of the Dutch word <bal>16 (<ball>) are presented.

The period of prevoicing is hightened in red and the burst of the bilabial voiced plosive /b/ is

marked in purple. Since voicing starts before the release of the plosive /b/, the VOT is labelled as

negative, in this case -96,1 ms. In a word produced by a native speaker of English, no prevoicing

is expected to be found, as can be seen in Fig. 8, which shows the waveform and spectrogram of

the English word <bush>17. Fig. 8 shows no vibration of the vocal folds before the burst, i.e. no

prevoicing was produced. The presence versus absence of prevoicing, in Figures 7 and 8,

respectively, is also visible through the presence (Fig. 7) and absence (Fig. 8) of a voice bar in the

spectrogram (marked in a blue square).

15 For this statement, van Alphen & Smits (2004) draw upon studies by Rothenberg (1968), Stevens (1998) and Svirsky, Stevens, Matthies, Manzella, Perkell & Wilhelms-Tricarico (1997). 16 The sound file for this word was taken from my Bachelor Research paper (Vanlocke, 2010) in which native speakers of Belgian Dutch were asked to perform a reading task. 17 The sound file was cut from the audio CD included in Collins & Vandenbergen (2000).

Page 20: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

20

Figure 7 Waveform and spectrogram of the word <bal> produced by a native speaker of Belgian Dutch.

(Praat, Boersma & Weenink, 2011)

Figure 8 Waveform and spectrogram of the word <bush> produced by a native speaker of English.

(Praat, Boersma & Weenink, 2011)

Page 21: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

21

2.3.3 INFLUENCING FACTORS

Since fewer studies have been devoted to prevoicing than to aspiration, van Alphen & Smits

(2004) tried to obtain more detailed information on prevoicing in Dutch18. Interestingly they

found that – despite it being an important auditory indicator in determining whether the plosive

is voiced or voiceless – speakers do not produce prevoicing consistently. Van Alphen & Smits

(2004) tested to what extent the production of prevoicing depends on the following factors: (1)

place of articulation, (2) the speaker’s gender, (3) the following phoneme, (4) lexical status and

(5) competitor environment.

Despite expectations expressed by van Alphen & Smits (2004) that factors (4) and (5)

would have an effect on the production of prevoicing, their study proved otherwise. With

respect to factor (4), lexical status, they had expected that when their informants were tested on

non-words, they would hyper-articulate and produce “more reliable prevoicing” (van Alphen &

Smits, 2004: 465), which turned out not to be the case. Finally, with respect to factor (5), a word

in a competitor environment was believed to be a reason to articulate more carefully in order to

reduce the chance of perceptual confusion, i.e. mistaking a voiced plosive for a voiceless one.

However, the data did not confirm this hypothesis.

2.3.3.1 THE EFFECT OF PLACE OF ARTICULATION

Van Alphen & Smits (2004) argue that, since the production of prevoicing is dependent on the

active or passive expansion of the vocal tract which keeps the transglottal pressure high enough,

a more posterior PoA will impose on this expanding capacity. This argument is backed by the

research done on children whose native language distinguishes between voiced plosives

(produced with prevoicing) and voiceless unaspirated plosives. These studies show that those

children do not acquire this contrast as fast as children whose language makes a distinction

between voiceless unaspirated and voiceless aspirated stops. Rothman, Koenig & Lucero (2002;

mentioned in van Alphen & Smits, 2004) attribute this later acquisition to the fact that the size of

children’s vocal tracts is smaller than for adults. Since the vocal tract is smaller, expansion

capacity will automatically also be smaller. Hence, prevoicing will be less easy to produce when

the capacity to expand is smaller. During the production of bilabial plosives more opportunities

are available than for alveolars to both actively and passively create the required enlargements.

18 Based on what e.g. Lisker & Abramson (1964) found, van Alphen & Smits (2004) tested solely on isolated words, since they recognized that when produced in a sentence context, “the phonetic realization of the voicing distinction” (van Alphen & Smits, 2004: 458) may be affected.

Page 22: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

22

Active expansion by means of tongue-body movements is most likely to be easier during the

production of /b/ than during that of /d/. While producing bilabial plosives, the tongue can

move more freely than in alveolar plosives since the production of the latter includes the tongue

as a means to render the necessary closure. As a consequence, the intraoral pressure will rise

more quickly in alveolars than in bilabials. Essentially, this means that prevoicing is easier to

produce when the PoA is more anterior, i.e. bilabial /b/, than when it is more posterior, i.e.

alveolar /d/.

Passively, more tissue surface can take part in the expansion process when bilabials are

produced than when alveolars are produced. Alveolars depend on “the pharyngeal walls and

part of the soft palate” (van Alphen & Smits, 2004: 459) for the passive enlargement. Meanwhile,

bilabials can not only make use of the aforementioned factors, but also of “all of the tongue

surface and parts of the cheek”19 (van Alphen & Smits, 2004: 459).

The level of difficulty may well be lower for bilabials than for alveolars, van Alphen & Smits

(2004) did not find an influence on the duration of VOT, but they did find that, since bilabials are

easier to produce than alveolars, the former are produced with prevoicing more frequently than

the latter. However, Smith (1978; mentioned in van Alphen & Smits, 2004) found that, in English,

prevoicing duration was also affected by PoA. Van Alphen & Smits (2004) expected this also to

be the case in Dutch, but their study proved otherwise. It could still occur that the current study

does show a difference in duration of VOT according to PoA, since the target-tokens are English

words but produced by native speakers of Belgian Dutch.

2.3.3.2 THE EFFECT OF GENDER OF SPEAKER

As previous studies have shown (e.g. Stevens, 1998; mentioned in van Alphen & Smits, 2004),

the size of the vocal tract of women is smaller than that of men. As a consequence, the pressure

in the oral cavity rises faster and in turn makes it more difficult for female than for male

speakers to produce prevoicing. Van Alphen & Smits (2004) found that women produced

prevoicing less frequently than men20. These results are in line with an earlier study by Smith

(1978; quoted in van Alphen & Smits, 1978). Van Alphen & Smits (2004) also found a slight

difference in the length of prevoicing, i.e. longer for men (mean: 109 ms) than women (mean: 89

ms; however, this was not significant enough to seriously take into account.

19 For these explanations, van Alphen & Smits (2004) drew on the works of Houde (1968; mentioned in van Alphen & Smits, 2004) and Rothenberg (1968; mentioned in van Alphen & Smits, 2004). 20 The results van Alphen & Smits (2004) obtained from their participants of which five were male and five were female, 86% of the tokens of the male speakers were produced with prevoicing. This stands in stark contrast to the 65% produced by the female speakers.

Page 23: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

23

2.3.3.3 THE EFFECT OF THE FOLLOWING PHONEME

Van Alphen & Smits (2004) also examined the effect of the phoneme following the initial plosive

on the production of prevoicing. The target tokens in their study either had a vowel or a

consonant as the second segment. Generally speaking, they found that when initial voiced

plosives are followed by a vowel, prevoicing is produced more often and longer (mean: -118 ms)

than when followed by a consonant (mean: -99 ms).

They claim that the reason for this phenomenon cannot merely be the difference in oral

cavity volume, because even though some of the following consonants give rise to a smaller oral

cavity size compared to when the initial plosive is followed by a vowel, this explanation would

not hold true for all the vowels and consonants they tested. In addition, they state that when

they compared the results among the plosives, they did not find a difference in duration of

prevoicing due to vowel height21 (which also results in differences in oral cavity size). Even

though they make an additional suggestion, namely that “the degree to which the vocal tract can

be expanded (passively or actively) plays a role” (van Alphen & Smits, 2004: 465), they do not

give a satisfactory explanation for this.

2.3.4 ENGLISH VS. DUTCH

A difference can be found between the phonetic realizations of the voiced stops /b/ and /d/ in

English and in Dutch (see Section 2.1). The latter belongs to the vast group of languages which

are characterized by the use of prevoicing. English on the contrary does not. Previous studies

have provided averages to which the results of this study will ultimately be compared.

On average, voiced plosives have been reported to render negative VOTs in Dutch of roughly

-100 to -80 ms and positive VOTs in English between 0 and 10 ms (Simon, 2010). However, it

has been noted that the actual production of prevoicing is largely dependent on the speaker:

some speakers produce it (consistently), while others do not (e.g. Lisker & Abramson, 1964; van

Alphen & Smits, 2004).

Lisker & Abramson (1964) provided average VOTs for voiced plosives, for both English and

Dutch, in isolated words and in connected speech. Words in isolation were produced with an

average VOT of -85 ms for /b/ and -80 ms for /d/ (by one native speaker of Dutch). Words with

/b/ or /d/ in initial position in sentence context were realized with a VOT of -41 ms and -51 ms,

respectively. For the four tested native speakers of English, they measured mean VOTs for /b/ of

anything between the two extremes of 1 to -101 ms and for /d/ between 5 and -102 ms.

21 This is contrary to the results in Smith (1978; quoted in van Alphen & Smits, 2004) who found vowel height to be an influencing factor on the duration of prevoicing as well as on the number of tokens that were prevoiced.

Page 24: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

24

Surprisingly, these negative VOTs are longer than the ones recorded in Dutch, which is contrary

to what is generally expected (e.g. Simon, 2010). However, they state that these results can be

attributed to the drastic variation among speakers in whether or not they are producing

prevoicing. Due to the fact that one of the tested speakers produced long negative VOTs while

the other three did not, this is automatically reflected in the averages reported by Lisker &

Abramson (1964), which actually give a distorted picture of reality. In sentences, the reported

averages range from 7 to -65 ms in /b/ and from 9 to -56 ms in /d/. It must be stressed that the

limited number of speakers tested by Lisker & Abramson leads us to make the tentative

conclusion that these values are not representative. Van Alphen & Smits (2004) obtained means

from 10 native speakers of Dutch. They found averages of -82,80 ms and -71,23 ms for /b/ and

/d/, respectively. Table 4 offers a comparison between the results given by Lisker & Abramson

(1964) and by van Alphen & Smits (2004). Furthermore, it shows the average VOTs calculated22

from the results of the aforementioned studies.

Table 4 Mean VOTs (in ms) for Dutch prevoiced stops measured in previous studies.

* These results represent the averages measured in sentence context.

Simon (2010) tested ten native speakers of English on the production of prevoicing. She

found that on average they produced VOTs of -93 ms for /b/ and of -91 ms for /d/. She

furthermore carried out a test which included ten native speakers of Dutch, who produced

English words. The averages that are reported here are -113 ms for /b/ and -105 ms for /d/.

22 The table also shows the VOTs found in sentence context. These were, however, not included in the calculation.

Page 25: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

25

3. THE EFFECT OF TRAINING

Over the years, many research papers have been devoted to the topic of pronunciation training

to L2 learners (or learners of a foreign language), and to the question of what is the best method

to teach pronunciation, i.e. the method(s) which yield(s) the most improvement. This chapter

will provide a brief overview of the methods which have been applied in search of the best

methods of pronunciation training. This search ranges from the discussion whether perception

or production training is the technique which gives rise to the most positive result, to the use of

real-time spectrograms which give the learner instant clues on what can and should be

improved.

3.1 PERCEPTION VS. PRODUCTION

It is known that learning both perception and production of a non-native language is difficult,

and is dependent on several factors (e.g. Flege, 1995; Guion, Flege, Ahahane-Yamada, Pruitt,

2000; mentioned in Hazan, Sennema, Iba & Faulkner, Andrew et al., 2005). Flege (1995; quoted

in Hazan et al., 2005), for example, has pointed out that interference from the native language

into the target language, i.e. the non-native language, can be detected. Interference is most

frequent when the sounds are not present in the learner’s native language or when these sounds

have differing phonological realizations in both languages.

Various researchers have found that perception training affects both perception and

production (e.g. Bradlow, Pisoni, Akahane-Yamada & Tohkura, 1997; Flege, 1989; see Section

3.1.1). Others were keen to find out whether production training also affected both production

and perception (e.g. Hattori & Iverson, 2008; Hattori, 2009; Mildner & Tomić, 2007; see Section

3.1.2). The following sections not only provide information on studies which have explored the

aforementioned phenomena, but they also discuss what effect audiovisual training (see Section

3.1.3) has on pronunciation of learners.

3.1.1 PERCEPTION

In his study on Chinese students’ perception of the word-final English /t/-/d/ contrast, Flege

(1989) was able to ascertain that the native language has a significant effect on L2 performance.

He found that those students whose native language does not allow for word-final obstruents

benefit less from perceptual training than those whose L1 involves obstruents in word-final

position. In other words, Flege (1989) argues that training can have a positive effect on L2

perception but only if the feature which learners are being trained on is already present in the

L1.

Page 26: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

26

According to Flege (1989), the success of training also depends on the auditory nature of

the feature which is being learned, i.e. contrasts which are not present in the L1 but which are

easily distinguishable are acquired more easily. For example, the contrast between Zulu clicks or

the contrast between Hindi voiceless aspirated and breathy voiced dental stops hardly require

training because these contrasts are auditorily easier to detect. A feature which is not so easily

recognizable will be harder to learn. Furthermore, Flege (1989) partially attributes the

mispronunciation of certain features to learners’ “inability to perceive L2 phones or phonetic

contrasts in a nativelike manner” (Flege, 1989, p. 1684). Flege (1989) thus stresses the

importance of auditory training rather than production training. Moreover, he stresses the

importance of perception as a positive influence on later production.

In accordance with Flege (1989), Bradlow et al. (1997) also found that teaching perceptive

skills to learners improves both perception and production. Their Japanese informants were

trained on perceiving the difference between English /r/ and /l/. The training not only

improved their perception but also their production of the specific English phonemes /r/ and

/l/.

3.1.2 PRODUCTION

Various researchers have pointed out the beneficial effects of production training on learner’s

pronunciation. For example, training sessions – specifically aimed at discrimination and

practice23 – been reported to improve learners’ pronunciation (e.g. Gimson, 1980; quoted in

Kendrick, 1997).

Kendrick (1997) argues that learners need to practice talking in order to improve

pronunciation. She tested students on several tasks and found that all of these exercises resulted

in a better pronunciation24, with the greatest improvement to be noticed in segmental features.

Mildner & Tomić (2007) explored whether speech training (in combination with regular

language classes) helped to improve American English and Spanish native speakers’

pronunciation of Croatian vowels. Acoustic analyses and evaluation processes performed by

native speakers of Croatian led Mildner & Tomić (2007) to the conclusion that students benefit

enormously from speech training, more specifically production training. In other words, they

found that the individual training sessions all students received yielded very good results with

regards to the quality of pronunciation of the foreign language, i.e. Croatian. They also showed

that the native speakers of American English studying Croatian as a foreign language improved

23 Practice designed to work on those specific features which the learner struggles with. 24 The students themselves who were involved in Kendrick’s study all expressed that speaking English as much as possible is of indispensable importance in acquiring the correct pronunciation.

Page 27: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

27

their pronunciation more than the Spanish students. Mildner & Tomić (2007) argue that this

difference arises because the vowel system of Spanish is more similar to that of Croatian, i.e. 5-

vowel system (i, e, a, o, u), than the American English vowel system (which is characterized by a

more elaborate vowel system). They claim that, as a result of the similarity between the Spanish

and Croatian vowel systems, American English students will show greater improvements, since

their vowel system initially deviated more from the one they are studying, i.e. Croatian.

Furthermore, they attributed the varying degree of improvement among individual speakers to

extralinguistic factors, such as motivation and attitude.

Previous studies have shown that perception training influences both perception and

production (cf. Section 3.1.1). Hattori & Iverson (2008) examined whether or not production

training has a similar effect on both perception and production, i.e. if production training

positively influences both perception and production. They found that production training was

only effective on the level of production and not on the level of perception. Hattori (2009), who

researched the perception and production of English /r/-/l/ by adult Japanese speakers, found

results which are in line with those of Hattori & Iverson (2008). Even though the speakers’

production of /r/ and /l/ had clearly improved, they had not necessarily improved their

perception of English /r/ and /l/, i.e. the level of accuracy with which they identified English /r/-

/l/ had not drastically ameliorated. Hence, Hattori (2009) concluded that production training

only leads to improvements in pronunciation but not in perception.

3.1.3 AUDIOVISUAL TRAINING

Hazan et al. (2005) researched the technique of audiovisual training. They conducted various

experiments which were designed to find out whether (1) visual gestures can help speakers to

learn certain phonemes, (2) audiovisual training is more effective than perceptual training and

(3) audiovisual training improves both perception and production.

Hazan et al. (2005) retained from their research that audiovisual training can be effective

but only if the visual gestures involved in producing a certain phoneme are easily noticeable for

the learners. However, Hazan et al. (2005) showed that learners are not influenced by visual

clues if these do not carry a phonemic contrast in their native language. In other words, speakers

make no use of the visual clues attached to a particular phoneme – even if the articulatory

gestures are clearly visible – if these gestures are not embedded in their L1 phoneme contrast.

Hazan et al.’s (2005) research showed greater improvements, in the contrast between labial and

labiodental consonants, after audiovisual training than after perceptual training. Improvements

on the account of audiovisual training could not only be noticed in perception but also in

sensitivity to audiovisual clues of the contrast. Students who were trained only on perception

Page 28: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

28

skills did not improve these aforementioned skills, i.e. positively identifying articulatory

gestures.

Hazan et al. (2005) performed another experiment in which they put a less visually distinct

contrast than that between labial and labiodentals consonants to the test, i.e. the contrast

between English /l/ and /r/. However, this test did not provide results supporting the theory

that audiovisual training is supposedly more effective than perceptual training, since the

Japanese participants did not show differing results with regard to the different training

methods.

Still, Hazan et al. (2005) found a surprising result regarding the specific method of

audiovisual training. They noticed a considerable difference in improvement between

informants trained with synthetic faces and those who were trained with natural audiovisual

stimuli. The results were in favour of the last technique, i.e. training with natural faces led to

greater improvements.

Another factor which Hazan et al. (2005) aimed to examine was whether audiovisual

training has an effect on pronunciation which is similar to that of perceptual training (perceptual

training improves both perception and production). In other words, they were eager to find out

if just like perception, production also benefits from audiovisual training. They discovered that

the articulatory gestures used to produce English /l/ and /r/ also influence the pronunciation,

even without the occurrence of specific pronunciation training.

3.2 REAL-TIME SPECTROGRAMS

Real-time spectrograms25 have been used in several clinical studies (e.g. Chaney, 1988; Huer,

1989; Hagiwara, Fosnot & Alessi, 2002; mentioned in Hattori, 2009) “for perceptual evaluation

of patients’ speech production” (Hattori, 2009: 133). These studies provide proof for the

effectiveness of the use of spectrograms. Chaney (1988; mentioned in Hattori, 2009) used

spectrograms to analyze American children’s correctly or incorrectly produced semivowels (i.e.

/w/, /r/, /l/ and /j/). Huer (1989; quoted in Hattori, 2009) tested a 10-year-old girl, over a

period of 70 days, who substituted /w/ for /r/. By means of acoustic tracking, Huer (1989;

mentioned in Hattori, 2009) was able to determine whether or not the girl’s speech deficit had

improved. Hagiwara, Fosnot & Alessi (2002; mentioned in Hattori, 2009) made an acoustical

analysis, before and after speech therapy, of a 6-year-old who pronounced /r/ wrongly. Using

25 Real-time spectrograms are spectrograms which are projected simultaneously with the speech production itself. This way, speakers can instantly notice whether or not they have produced a feature correctly whilst the method they used in the production is still fresh in their memories. They can furthermore compare between different productions of the same feature and thus ascertain themselves what the best method is for them to pronounce the feature in the right way.

Page 29: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

29

this method, they were able to determine that after having received therapy, the child’s

pronunciation of /r/ had changed for the better, i.e. pronunciation of /r/ had improved.

Recently, real-time spectrograms have been used by a number of scholars in L2 learning

studies (e.g. Hattori & Iverson, 2008; Hattori, 2009). In their experiment, Hattori & Iverson

(2008) made use of real-time spectrograms to establish whether native speakers of Japanese

had improved their production of English /r/ and /l/ after training. They found that, after

pronunciation training, the Japanese participants pronounced English /r/ more accurately26

than before. However, Hattori & Iverson (2008) were also able to conclude that the speakers did

not improve the level of accuracy of identifying English /r/ and /l/ nor did they improve their

ability to discriminate between both phonemes.

Following Hattori & Iverson (2008), Hattori (2009) made use of real-time spectrograms in

his study on the perception and production of English /r/ and /l/ by Japanese speakers. During

the sessions, real-time spectrograms were employed when the participants produced English

/r/ and /l/. The study showed that Japanese speakers had benefited from the pronunciation

training, since they produced “more identifiable English /r/ and /l/ syllables after the training”

(Hattori, 2009; p. 152). Through these positive results, Hattori (2009) was able to confirm that

the procedures employed in the training sessions, i.e. specific instructions and feedback (see

Section 3.3.1), were effective. He states furthermore that “L2 learners seem capable of learning

details of non-native segments (e.g., articulatory movements and temporal information) as long

as specialists (e.g., phoneticians, teachers) orient the L2 learner’s attention to specific aspects of

L2 production.” (Hattori, 2009: 177-178). Hattori (2009) also points out that these results could

suggests that if L2 learners were to be provided with explicit instruction early on in the learning

process, they may not establish incorrect phonetic categories and articulatory movements. This

way they would more quickly and more easily be able to improve L2 phoneme learning.

3.3 OTHER TECHNIQUES USED IN PRONUNCIATION TRAINING

There are other factors beside perception, production or audiovisual training which can be

beneficial to a learner’s pronunciation of a non-native language. Two of these are discussed in

the following sections: feedback and contrasting the foreign language to the native language.

26 The greatest improvement could be noticed in those speakers who produced poor English /r/ before training.

Page 30: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

30

3.3.1 FEEDBACK

Various studies on training discussed in the previous sections (e.g. Flege, 1989; Hattori &

Iverson, 2008; Hattori, 2009; see Sections 3.1 and 3.2) have shown a positive effect of feedback

on learners’ pronunciation. The technique of giving immediate feedback to the learner has

proven its effectiveness in training sessions.

In his study on the effect of training on Chinese speakers’ perception of the word-final

English /t/-/d/ contrast, Flege (1989) found a positive effect of feedback training in only two of

his participants. He also addresses the phenomenon of generalization, i.e. generalizing specific

properties – learned in specific tokens – to other words which were not included in the training

session(s). According to Flege (1989), there is a condition which needs to be met in the feedback

training in order to reach generalization. If the feedback training has a beneficial effect on the

trainee’s “tacit knowledge” (Flege, 1989, p. 1691), the learned features are expected to have a

generalizing effect onto untrained words as well as on trained ones. If, however, only the

phonological or phonetic specification of individual words is affected due to training, no

generalization will be found. Furthermore, Flege argues that

“[t]he multiple natural token approach to speech training assumes that exposure

to the acoustic variation between tokens of a single category will induce subjects

to derive a more general representation than they would derive had they been

trained on just a single token.” (Flege, 1989: 1691)

Bradlow et al. (1997) also found that both perceptual and production knowledge was

generalized to novel words.

Hattori (2009) states that the informants who participated in his research were provided

with instant feedback on their production27. The use of this technique ameliorated the

participants’ pronunciation of English /r/ and /l/, not only in words, but also in sentences and

passages. According to Hattori (2009), the extension of knowledge from words to longer speech

utterances proves that the participants generalized /r/ and /l/ productions to continuous

speech.

27 Hattori (2009) mentions an example of what kind of feedback was given to the participants: “if he [the instructor] found that participants’ English /r/ F3 was too high (e.g., 2500 Hz), he told participants to check their tongue potion, tongue shape, and lip shape. If he found that participants were producing good English /r/, he provided positive feedback and encouraged the participants to maintain the articulation and produce the consonant.” (Hattori, 2009: 137).

Page 31: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

31

3.3.2 CONTRASTING WITH NATIVE LANGUAGE

In Collins & Vandenbergen’s Modern English Pronunciation, A practical guide for speakers of

Dutch (2000; henceforth MEP), the predominant technique to teach learners a good

pronunciation of English is to provide learners with practice sentences. These practice sentences

are designed to train learners on specific features, one by one. Other exercises are also included

to instruct features of connected speech, such as intonation and stress. Since this guide, designed

by Collins & Vandenbergen, is aimed specifically at learners from the Dutch-speaking part of

Belgium, on many occasions, a direct comparison is made between the pronunciation of certain

features in English to that in Dutch. Many of those instructions point out a certain feature which

exists in one of the languages but does not in the other. Also, references are made to specific

features of English which differ only slightly from the way they are pronounced in Dutch. These

are then described using a particular word to indicate or illustrate the differences and/or

similarities between English and Dutch. For example “English /e/ (in DRESS) is a little closer

than Dutch /ε/ (in ZET)” (MEP, p. 37) or also “English /æ/ is much more open then Dutch /ε/ -

nearer in quality to a shortened version of Dutch /a:/ (in LA)” (MEP, p. 38). Targeting a specific

language and highlighting differences and similarities between the languages seems to be an apt

instruction technique for learners of a foreign language.

Page 32: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

32

4. CASE STUDY

4.1 HYPOTHESES

4.1.1 GENERAL AIM OF THE CASE STUDY

The general aim of the case study was to determine in what way pronunciation training affects

the participants’ production of aspiration and prevoicing.

Previous studies (e.g. Collins & Mees, 2008; van Alphen & Smits, 2004) have shown that

both voiced and voiceless plosives are realized differently in Dutch and English. In short,

aspiration is a phonetic process which can be found in English but not in Dutch voiceless stops

and prevoicing can be found in Dutch voiced stops but not in English ones. Therefore it can be

expected that these particular features are difficult to acquire or unlearn, respectively, for

Belgian Dutch speakers of English. On the basis of this knowledge, the following hypotheses can

be proposed.

Since the participants did not get any particularly specific pronunciation training on either

of these features, they are expected to transfer the phonetic realizations of their native language,

i.e. Belgian Dutch, into the second language, i.e. English, in the pretest (see Sections 4.2.2 and

4.3). Aspiration will most likely be absent in their realization of English voiceless plosives

through influence of the Belgian Dutch voiceless unaspirated equivalents of /p, t, k/. For the

production of prevoicing, it is expected that the informants will be inclined to produce

prevoicing in their realization of the English voiced plosives /b, d/, again as a result of the input

from their native pronunciation, which is marked by prevoicing. In other words, they are

expected to do the exact opposite of what is expected in English, i.e. omit aspiration in voiceless

stops and produce prevoicing in voiced stops.

In the posttests (see Sections 4.2.2 and 4.3), i.e. after they have received one-on-one training

and feedback, the informants are expected to have improved their pronunciation of both

features, since the techniques which were used in the training session have shown beneficial

effects on learners’ pronunciation (e.g. Flege, 1989; Hattori & Iverson, 2008; Hattori, 2009). A

relatively big improvement is expected to arise in the picture-naming task which is identical to

the task in the pretest. The participants’ target-like pronunciation is believed to spike in the

words that they were trained on during the session (e.g. pig, tent, key, bus, dad).

Page 33: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

33

4.1.2 SPECIFIC HYPOTHESES

A number of specific hypotheses related to the topic can also be suggested. In line with Flege’s

(1989) observations, namely that auditorily easy detectable features of pronunciation are easier

to learn, it can be hypothesized that the habits of the production of aspiration will be more easily

adapted than those of prevoicing, since the latter is not as easily auditorily detected as the

former. With regards to the voiceless plosives, the way the VOTs of /p, t, k/ relate to each other

is expected to be p < t < k, because the present study was conducted with adults. Most of the

studies on VOT in voiceless plosives which were conducted with adult informants have reported

this relationship between VOT lengths (cf. Section 2.2.3). Concerning the voiced plosives, the

following specific hypothesis can be put forward. Since the pool of informants in the present

study includes men as well as women, the difference in frequency according to gender (van

Alphen & Smits, 2004; see Section 2.3.3.2) could possibly also be detected. Moreover, the results

of the current study might show a difference in VOT duration which would contradict van

Alphen & Smits’ (2004; see Section 2.3.3.1) findings that no significant difference according to

PoA of voiced plosives can be found.

4.2 METHOD

4.2.1 PARTICIPANTS

All thirteen participants were selected on the basis of their age, i.e. between 21 and 31 (mean

age: 24,4), and their language background, i.e. they were all native speakers of Belgian Dutch,

who had knowledge of English but had not necessarily received specific training in

pronunciation. The informants all took part in the experiment on a voluntary basis. Before the

actual experiment started, they were asked to fill out a questionnaire28 (see Appendix A) which

contained queries on their language-background, their knowledge of English and a self-

evaluation of their pronunciation of English, alongside two meta-linguistic questions on

aspiration and prevoicing.

Of the thirteen volunteers, eight were female and five were male. All participants – and their

parents – were native speakers of Belgian Dutch. While all of them knew other languages (e.g.

French, German, Italian, Spanish and Japanese), most of them claimed that the only language

they used on a daily basis was Dutch. Four of the thirteen participants stated that they used

English on a daily basis. Only three informants claimed never to have had any contact with

28 Each participant was told that they could fill out the questionnaire in Dutch, if they felt they would otherwise be restricted in giving a satisfactory answer. However, seven of them chose to fill it out in English, which indicates that they felt confident enough to use English to express themselves.

Page 34: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

34

native speakers of English. The other ten had come into contact with native speakers of English –

in both spoken and written form – through friends, relatives, customers, business associates or

colleagues. The same ten informants had also spent some time in English speaking countries.

Most of them had spent time in these countries on holiday, but others stated to have gone there

for work or for an internship associated with a student organisation. On average, they had spent

five days to a week in an English speaking country. One person had been there for only one day,

while another had spent six weeks abroad.

All participants had taken an English course at secondary school, ranging from four to seven

years. While seven of them claimed to have received training in pronunciation, they could not

remember any specific features which were trained on during class. They did however

remember the methods which were used in class to improve their pronunciation, i.e. reading

aloud and repetition tasks. At college or university level, five informants had taken an English

class for an average of 2,2 years. Two informants claimed to have received specific instructions

on pronunciation, on all possible features of English. One participant reported to have received

training in a so-called language lab. In other words, he was asked to read out loud into a

microphone, his readings were recorded and afterwards, the instructor listened to the sound

file. The instructor then pointed out the mistakes which were made. One of the participants who

took English courses at university or college – but did not receive any pronunciation training –

stated that she wrote her dissertation in English.

All participants, except for one, acknowledged to have got some of their knowledge of

English from the media, i.e. television, radio, newspapers, magazines, novels, video-games, music

or the internet.

Participants 1, 7 and 12 rated their pronunciation as good to very good and informants 3, 9

10 and 11 marked their pronunciation of English as okay. Participants 2, 4, 5, 8 and 13, however,

labelled their pronunciation as bad, and informant 6 even indicated her pronunciation to be

horrible. The informants explained why they had chosen to self-evaluate their English in that

way. Eight of them stated that their pronunciation clearly gives away that they are not native

speakers, and thus felt their pronunciation to be okay, good or very good. Participant 7 stated

that she is often asked whether she is British or whether she has lived in the UK, which is why

she put down very good. Participant 4 stated that she was used to reading in English but not

speaking and so she felt that her speaking-skills were not satisfactory, so she marked her own

pronunciation as bad. Informant 6 said that she rated her pronunciation as horrible because it

was never trained on in school, so she did not know which features she needed to improve29.

Even though many of them felt as if they had an okay up to a very good pronunciation, all

29 This last claim was also expressed by participant 5.

Page 35: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

35

participants expressed that they would like to improve their pronunciation of English. Reasons

for wanting to improve included: the importance of English as a world language, improving

comprehensibility and achieving a native-like pronunciation. The specific features which were

mentioned by the informants, which they would like to improve on are the vowels, the

difference between the onset phonemes in that and think, the difference between final /d/ and

/t/ (as in for example bed and bet). One informant (P7) wanted to be able to know the difference

between British and American English pronunciation of certain words and provided the example

of <schedule>.

Informants took part in the experiment voluntarily and were unaware of the purpose before

taking part in the pretest. The questionnaire revealed that only three of the thirteen informants

knew what the process of aspiration entails (and gave a correct example) and just one claimed to

have heard of prevoicing but failed to give an example.

4.2.2 STIMULI AND DESIGN

4.2.2.1 PRE- AND POSTTEST: PICTURE-NAMING TASK

Each participant was asked to perform a picture-naming task30 which consisted of 75 pictures31.

Of these 75 images, 50 were target-tokens and the remaining 25 were fillers. The fillers were

added in order to draw the participants’ attention away from the purpose of the experiment.

Some examples of distractors are flower, lemon and heart. The 50 target-tokens are consist of 10

word with each of five English plosives (i.e. /p, t, k, b, d/) in the onset, e.g. pig, tent, key, ball, dad.

In the present study, the target stimuli only included plosives in word-initial stressed position;

otherwise, the training session might have become too intricate. In order not to overload the

participants with too much information in a short time-span, these were left out. Only

monosyllables were chosen as target stimuli because these are known to render the longest

VOTs (e.g. van Alphen & Smits, 2004; Spencer32, 1996). Moreover, the tokens chosen to elicit

prevoicing were all words in which the initial voiced plosive was followed by a vowel. Van

Alphen & Smits (2004) showed that initial voiced plosives followed by a vowel are more often

produced with prevoicing than voiced plosives followed by a consonant. Furthermore, the

duration of prevoicing was also found to be longer in voiced plosives with a vowel as the second

30 A picture-naming task was chosen because it was believed that this way, informants would less easily become aware of which features they were being tested on, and because they are then less influenced in their production. It was thought that by naming pictures, participants would produce the words in a more spontaneous way than they would when reading the orthographic forms of the words. 31 See Appendix B1 for a complete list of the target stimuli and fillers. 32 Spencer (1996) states that aspiration occurs after initial voiceless stops /p, t, k/, at the beginning of any stressed syllable, and furthermore distinguishes between monosyllabic and polysyllabic words.

Page 36: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

36

segment (cf. Section 2.3.3.3). These are the reasons why the present study contained only target-

tokens of words with initial voiced plosives followed by a vowel.

All of the words, which were believed to be fairly easily recognizable, were retrieved from

memory or through the use of dictionaries. The images accompanying them were looked up with

Google Image Search. The pictures were then set into a PowerPoint presentation33. Some of the

images were complemented by short hints or an explanatory sentence34, in order to heighten the

likelihood of the target words being produced, e.g. This is a ... (cup) of coffee. The images were

projected onto a computer-screen randomly and individually for a period of 7 seconds (i.e. the

participants did not need to press any keys; the slides were programmed to proceed

automatically), which was believed to be long enough to identify them. The participants were

however told that if they had not had enough time to name the picture they could click back and

take their time to name it. Furthermore, they were informed that if they thought they had not

produced the right word, they could correct themselves and name the picture again. This did not

affect the results in any way since naturally, the incorrect production of the target-stimuli was

not taken into account in the analysis.

The same picture-naming task was used in the posttest. It took the informants maximally 10

minutes to perform the picture-naming task.

4.2.2.2 TRAINING SESSION

After having performed the pretest, the participants were given an individual training session of

approximately 25 minutes by a phonetically-trained native speaker of Dutch with a high

proficiency in English. This session was conducted in Dutch rather than in English in order to

ensure that no information was lost on the informants. This way they would also not feel

restricted if they wanted to ask any questions. The training session consisted of two main parts:

a theoretical explanation of aspiration and prevoicing, and a practical production task. During

the session, two laptops were made use of; one that showed the PowerPoint presentation35 and

one which was used during the exercises.

First, the participants were provided with some theoretical background information on the

process aspiration (e.g. what is aspiration, what is positive VOT, etc.). This theoretical part only

33 For a complete rendering of the picture-naming task the way it was presented to the volunteers, see Appendix B2. 34 After having analyzed the sound files of three participants (P1, P2 and P4), it was noticed that some of the tokens were named differently than was intended. Moreover, participants themselves expressed doubt about the correct naming of some of the pictures. It was then decided to change some pictures and/or to add a hint to make them more obvious to the rest of the participants so they would produce the target-token as they were meant to be produced (e.g. the token dive was named as swim, so the picture was changed and the comment He likes to scuba … (dive) was added). 35 See Appendix C1 for the entire PowerPoint presentation as it was used during the training session.

Page 37: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

37

included information on aspiration of voiceless plosives in word-initial stressed position. No

information on s + stop clusters nor on unstressed syllables (cf. Section 2.2.1) was provided, in

order to keep it simple enough to understand and remember after only a single short session

and because the stimuli did not contain any of these structures. With regards to the process of

prevoicing, also only basic information was provided on the lack of prevoicing in voiced plosives

in English (e.g. what is prevoicing, what is negative VOT, etc.) so as not to overburden the

informants with too much detailed information. The differences between English and Dutch

production of both of these phenomena were explained and demonstrated by means of listening

fragments36 and of stills of some waveforms and spectrograms (cf. Fig. 1, Fig. 2, Fig. 6 and Fig. 7).

Flege (1989) partially attributes the mispronunciation of certain features to the “inability to

perceive L2 phones or phonetic contrasts in a nativelike manner” (Flege, 1989, p. 1684). Hence,

the participants in the current study could benefit from listening to a native speaker producing

aspiration and prevoicing as it should be. After having noticed the contrast with their native

language, i.e. Belgian Dutch, they might be taught the correct pronunciation in a more easy way.

Aside from the purely theoretical background, the informants were given the opportunity to test

how to produce aspiration and how not to produce prevoicing.

Secondly, the final part of the training session contained a few exercises on words which

had occurred in the pretest (e.g. pen, tea, cat, bus, dog) and which would also be tested again in

the posttest37. The participants could record themselves and re-listen, plus they could watch

real-time spectrograms (SFS/RTGram, Version 1.3) of their own voices, which were provided

with feedback. They were also given a handout38 which contained a summary of the useful tips

that were explained to them during the theoretical part of the training. This way, if they had not

pronounced aspiration or if they had produced prevoicing, they could refresh their memories on

the articulatory gestures39 involved in both processes.

4.2.2.3 EXPANSION TO THE POSTTEST: WORDS

The posttest did not only contain an identical version of the pre- picture-naming test, but also a

short reading test40. This was given to them in a printed out version. Using orthographic forms of

words was expected to have an influence on the participant’s pronunciation. Since they could

36 These soundfiles can also be found on the cd-rom included in Appendix D. 37 This was done in order to see if the words that had specifically been trained on showed a larger improvement than those which were not, or if the participants had generalized the acquired information to so-called ‘new words’ which were not trained on. 38 For an example of the handout, see Appendix C2. 39 These articulatory gestures are different from those in their L1. 40 See Appendix B3 for a complete list of the target-stimuli and the fillers.

Page 38: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

38

already see the phoneme with which the words began, they could consider more quickly how to

produce it correctly. The word-reading task took on average 2 minutes to complete.

The list41 contained 30 words, of which an equal number were distractors (e.g. film, chair,

map) and target stimuli (e.g. pay, text, coast, bed, dust). Each plosive occurred three times.

Furthermore, five words – one for each plosive – that had been trained on, i.e. pig, tent, cat, bus,

dad, were also included. This could reveal if these showed more improvements because they had

been trained on. The informants were asked to read the words out loud at their own pace.

4.2.3 PROCEDURE

Before taking part in the experiment, the informants were asked to complete a questionnaire

which gave insights into their respective language backgrounds, knowledge of English, etc. All

thirteen participants were then asked to perform the pretest, i.e. the picture-naming task. They

were seated in front of a laptop, in a quiet room, with the recording device lying on the table

between them and the computer. The recordings were all made on a Philips Digital Voicetracer

7675.

Before testing began, participants were given oral instructions in Dutch on what was

required of them. The same information was repeated in English in written form on the first

slide of the PowerPoint presentation. These 15 seconds during which the proper instructions

were shown on their screen, gave the participants the opportunity to settle themselves and to

prepare to switch to English. The instructor left the room for the duration of the test so they

would feel more at ease and so they would not look for confirmation from the instructor which

would lead to hesitation in pronunciation. They were asked to name the pictures out loud which

appeared on their screen one by one 42. The pretest took about 15 minutes per person including

giving the proper instructions and answering possible questions on the part of the volunteers.

Immediately after the first task, the informants were given the individual training session.

They were told that they could interrupt and ask questions at any time. During this session, a

second laptop was made use of, on which a programme was installed which made it possible for

the informants not only to record themselves and re-listen to their own pronunciation but also

to watch real-time spectrograms (SFS/RTGram, Version 1.3). The instructor was seated next to

them and provided them with instant feedback on the words they trained on.

The posttest was conducted the day after the pretest and the training session. For the

picture-naming task, which was identical to the one in the pretest, the same instructions were

41 See Appendix B4 for the complete list as it was given to the informants. 42 After it was noticed that a few of the participants produced an article before the target-word (e.g. a bear), the request not to do this was added in the instructions to the remaining participants of the pretest, and to all participants before the posttest.

Page 39: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

39

repeated. Again, the volunteers were seated at a desk in front of a computer screen, with the

recorder placed between them and the laptop. The test on written words was held afterwards.

The participants were told to read the words out loud at their own pace. The recording device

was placed on the desk next to the paper version of the test. After that, the instructor left the

room for the duration of the tests.

4.2.4 ANALYSIS

Once the sound files were acquired, they were saved on a computer43. Using the software

included with the Philips Digital Voicetracer 7675, the .zva format was converted to a .wav for

analysis. The recordings were analysed in Praat (Boersma & Weenink, 2011). In Praat, the VOTs

were measured (in ms) for both voiced and voiceless plosives. This way it could be determined

whether or not the participants had produced aspiration or prevoicing. To determine production

of prevoicing, the researcher relied on van Alphen & Smits (2004) who stated that:

“The beginning of the prevoicing was defined as the point in time at which evidence

of vocal fold vibration could be detected. Any clearly visible detectable period, no

matter how small in amplitude, was accepted as part of voicing. The end of the

prevoicing was defined as the point in time at which the noise of the release burst

started, visible as a sudden peak in the waveform.”

(van Alphen & Smits, 2004: 461-462)

Aspiration was measured from the onset of the burst up till the onset of voicing for the

following vowel, i.e. up till the moment the waveform became periodic.

4.3 RESULTS AND DISCUSSION

4.3.1 ANALYSIS

In English, words with the voiceless stops /p, t, k/ in the onset are produced with aspiration.

This leads to a longer VOT than in Dutch, since the process of aspiration is absent in the latter. If

participants produce the words in the pretest with VOTs typical for Dutch, it can be concluded

that their native language, i.e. Belgian Dutch, interfered in the pronunciation of English tokens.

Words with voiced stops /b, d/ in the onset are – in Dutch – typically produced with prevoicing,

which renders negative VOT. In English, this process is not present. In case the informants

43 A copy of the acquired recordings and the analysis of the VOTs done in Praat is provided in Appendix D.

Page 40: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

40

uttered the words with prevoicing in the pretest, this can be attributed to the influence of

Belgian Dutch.

VOTs for both voiced and voiceless stops were measured again after training. If these

results turned out to be better, i.e. longer positive VOTs for voiceless and shorter negative VOTs

for voiced plosives, it can be concluded that training learners has a positive effect on their

pronunciation of English.

The VOTs for all target tokens, for each of the participants and for each of the voiced and

voiceless stops seperately, are presented in Appendices E and F. Of these results, averages were

calculated, per informant, per target-token, for each of the plosives separately. Results for pre-

(see Appendix E) and posttest (see Appendix F) were first kept separately, but put together

(Appendix G) in the order of the picture-naming tasks as presented to the informants. The

results were then processed in Excel, and put into tables and graphs, in order to highlight any

possible progress participants could have made.

First, an analysis was made of the results of the pretest (cf. Section 4.3.2; Appendix E). The

results of this test were helpful in determining the beginning level of VOTs. This way, the results

of the pre- and post-training tests could be compared to each other in order to establish whether

the informants had improved their pronunciation of English. The results for voiced and voiceless

stops were analyzed separately.

Secondly, the results of the posttest (cf. Section 4.3.3; Appendix F) for both voiced an

voiceless stops were analyzed individually. The results of the picture-naming posttest were then

compared to the results from the pretest, to determine whether training had had an effect on the

participants’ pronunciation habits, and whether the words that were specifically trained on

during the pronunciation session showed greater improvement than so-called new words, i.e.

words which did not receive attention during training. The results of the word-reading posttest

were also analyzed. Those words which had already appeared in both picture-naming tasks

were compared to the VOTs from the word-reading task.

4.3.2 PRETEST

As described in Section 4.2.2, all participants performed a picture-naming task which included

words with both voiced and voiceless stops in the onset. The pretest was designed to establish a

level to compare the posttest results to. This way a possible evolution, as a result of the training

session, could become apparent. In the following sections, voiceless and voiced stops will be

discussed separately.

Page 41: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

41

4.3.2.1 ASPIRATION

During the analysis of these results, two trends surfaced: (1) the effect of PoA (discussed in

Section 2.2.3) on the VOTs of the target plosives, i.e. /p, t, k/, manifested itself and (2) the VOTs

produced in the pretest showed a large amount of target-like VOTs44 for each of the plosives.

As described in the literature review (cf. Section 2.2.3), PoA has an effect on the VOT of the

voiceless stops /p, t, k/. In Graph 145, it can be noted that eight of the thirteen participants

demonstrated this effect. Their average VOTs show an increase in length from /p/ over /t/ to

/k/, i.e. VOT of p < t < k. The mean VOT value which was calculated for all thirteen informants

shows that the mean VOT for /p/ (48,2 ms) is 15,7 ms shorter than for /t/ (63,9 ms). The mean

VOT for /k/ (67,6 ms) in turn, is 3,7 ms longer than for /t/. These findings confirm that PoA, i.e.

bilabial, alveolar or velar, affects the duration of VOT. The VOTs of each of the voiceless plosives

relate to each other as p < t < k, as they do here.

The results presented in Graph 1 also show that – in this pretest – many participants

already produced a lot of target-like VOTs. Each of the plosives will be discussed separately.

Even though VOTs ranged widely from participant to participant, most informants produced

a relatively large amount of VOTs – for /p/ and /t/ as well as for /k/ – which are target-like.

Participants 1 and 11 (henceforth, P#) showed the highest VOT values for all of the plosives, i.e.

all above 70,7 ms. The minimum average among all informants for /p/ was 18,1 ms with an

individual minimum of 6,8 in the word <pig>, pronounced by P6. The maximum VOT found for

/p/ was 88,6 ms. The individual maximum of 122,2 ms which was found in the word <pan> was

uttered by P11. Of the 130 tokens with /p/ in the onset, 22 were named differently than was

intended or were not uttered at all.

The remaining 108 target-tokens showed 46 tokens which were pronounced with a target-

like VOT, i.e. of 54,15 ms or more. This means that in the pretest, already 42,6% of the tokens

with the bilabial voiceless plosive in the onset was produced with aspiration.

For /t/, the means among all thirteen participants ranged from 34,4 ms to 94,9 ms.

Individual mean VOTs were measured between 15,9 ms (<tape> produced by P8) and 152,7 ms

(<tea> uttered by P4). In the case of /t/, of the 130 tokens, 26 tokens were not named as was

intended or were not produced. The remaining 104 tokens showed 53 of them which had been

44 The averages – which were used to compare to the ones obtained in this experiment – are the ones calculated from means reported in previous studies (Table 1). These are considered as the target-like VOTs. All the results from this experiment were labeled as target-like, if the VOT for the voiceless stops was anything from the number mentioned in Table 1 upwards. 45 For the exact numbers, see Appendix E1.

Page 42: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

42

0,0

10,0

20,0

30,0

40,0

50,0

60,0

70,0

80,0

90,0

100,0

Po

stit

ive

VO

T (

ms)

Informants

/p/

/t/

/k/

produced with a target-like VOT, i.e. of 66,49 ms or more. This leads to the conclusion that for

/t/ a striking 51,0% of the tokens in the pretest was pronounced with aspiration.

Among all informants, average VOTs for /k/ were recorded between a minimum of 32,1 ms

and a maximum of 35,1. Individual averages ranged from as little as 2,4 ms in the word <key>

(produced by P6) to 125,4 ms in the word <cup> (uttered by P13). Ten of the 130 tokens with

/k/ in the onset were named incorrectly, i.e. not the way that was intended by the researcher, or

were not produced. The 120 tokens that remain contained 58 tokens produced with a target-like

VOT, i.e. of 70,02 ms or more. In other words, 48,3% of all tokens with velar plosive /k/ in the

onset were pronounced with aspiration.

Taking all tokens and all informants into consideration, the average values for each of the

plosives are 48,2 ms for /p/, 63,9 ms for /t/ and 67,6 for /k/. These means come very close to

the means found in previous studies (cf. Table 1). For /p/ this is only 5,95 ms shorter, for /t/

only 2,59 ms shorter and for /k/ only 2,42 ms shorter. Of all three voiceless stops together, the

informants produced an impressive 47,3% with aspiration.

Graph 1 Average VOT results for voiceless plosives

in the pretest

Page 43: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

43

Table 5 provides an overview of the number of times (out of a possible 10), each participant

uttered each of the plosives with a target-like VOT. The Table illustrates that P1 uttered tokens

with target-like VOT the most often of all informants, i.e. 93%. P6 and P12 performed the worst,

with only 3% of tokens produced with a target-like VOT. What can also be noticed in the Table is

that the tokens with /k/ in the onset were most often produced with aspiration. On average

among all participants, /k/ was pronounced target-like 4,5 times, while target-like VOTs in /p/

and in /t/ were produced 3,5 and 4,0 times, respectively.

Table 5 Number of times target-like VOT per informant per plosive

All in all, the results of the pretest lead to the conclusion that the informants performed the

test differently than was expected. It was foreseen that more influence for Belgian Dutch would

be noticeable, i.e. that VOTs would be shorter since no aspiration is produced in Dutch. Not only

did the participants produce longer VOTs than was anticipated, they uttered the voiceless

plosives more often than was expected with target-like VOTs.

4.3.2.2 PREVOICING

The analysis of the results obtained in the pretest of the voiced plosives /b/ and /d/ revealed

five things: (1) the frequency of target-like VOT is not significantly influenced by PoA, (2) the

effect of gender on the production of prevoicing did not manifest itself, (3) the height of the

vowel did not seem to influence the production of prevoicing, (4) prevoicing is a process which

some speakers have the tendency to produce (more consistently) while others do not and (5)

those speakers who did prevoice did it less heavily than was expected.

Page 44: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

44

What the results of the pretest definitely demonstrate is that prevoicing is a process which

some speakers have the tendency to produce while others do not, or not as frequently. Hence,

the two extremes can be found, i.e. from 0 ms to over -200 ms. Each of the plosives is discussed

separately.

The average minimum negative VOT for /b/ was measured at -38,3 ms. The maximum VOT

that was recorded is -143,5 ms. An individual maximum can be found in P13 with a VOT of -

228,7 ms in the word <bus>. The minimum VOT – aside from 0 ms – found in individual results

is -20,6 ms by P3. Of the 130 tokens with the bilabial voiced plosive in the onset, 6 tokens were

not named or were named wrongly. The 124 tokens which were named as was intended,

rendered 31 tokens, that is 25,0%, with a VOT of 0 ms, i.e. target-like.

For /d/, mean negative VOTs ranged from -114,8 ms to -24,6 ms. The longest VOT was

measured in the word <dead>, uttered by P12 (-192,2 ms). P9 produced the shortest VOT,

namely -31,4 ms in <dance>. Ten of the total of 130 tokens were not named the way it was

intended or were not uttered at all. The remaining 120 tokens gave 37 target-like VOTs. This

means that 30,8% of the tokens with /d/ in the onset had a VOT of 0 ms.

Taking both voiced plosives into account, 27,9% of all cases was produced without

prevoicing. This means that the informants prevoice less heavily than was expected. Besides

from the 27,9% of tokens produced with a VOT of 0 ms, the mean VOTs reported in Graph 246 are

in line with (or are a little shorter than) the results obtained by Simon (2010), who also tested

on prevoicing in English spoken by native speakers of Belgian Dutch (cf. Section 2.3.4).

Even though no real manifestation of difference between PoA could be noticed, the

difference in percentages between /b/ and /d/, 25,0% and 30,8% respectively, does show a

slight tendency of the bilabial plosive to be produced more often with prevoicing than its

alveolar counterpart.

46 For the exact numbers see Appendix E2.

Page 45: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

45

Graph 2 Average VOT results for voiced plosives in the pretest

An overview of the number of times (out of a possible 10), each participant, uttered each of

the plosives with a target-like VOT is presented in Table 6. The Table shows that P1 consistently

produced prevoicing in the pretest, which is equivalent to 0 % target-like VOTs of 0 ms. The best

results are found in P7 and P11, who uttered 55,0% of all tokens target-like. Only a slight

difference can be seen between both voiced plosives, i.e. alveolar /d/ (2,8 times) was produced

with a VOT of 0 ms, a mere 0,4 times more often than bilabial /b/ (2,4 times). The bilabial voiced

plosive /b/ as well as the alveolar voiced plosive /d/ were produced with a VOT of 0 ms roughly

only 2,5 times, i.e. a fourth of all times.

The anticipated effect of PoA on VOT in voiced plosives did not manifest itself dramatically.

The informants omitted prevoicing only 6 times more often in /d/ compared to /b/ (31 times 0

ms for /b/ and 37 times 0 ms for /d/). The effect of gender is also not noticeably present, neither

is the supposed influence of the height of the vowel (cf. note 21).

0,0

20,0

40,0

60,0

80,0

100,0

120,0

140,0

160,0

Ne

ga

tiv

e

VO

T (

ms)

Informants

/b/

/d/

Page 46: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

46

Table 6 Number of times target-like VOT per informant per plosive

To sum up, the participants produced a relatively great number of tokens with a target-like

VOT (27,9%). The average VOTs of those tokens which were pronounced with prevoicing were a

little shorter than anticipated though comparing them to results obtained from previous studies.

This led to the conclusion that the informants not only prevoiced less often than expected but

also less heavily than was foreseen.

4.3.3 POSTTEST

The posttest was conducted after each participant had taken part in a one-on-one training

session on the processes of aspiration and prevoicing. The test consisted of two parts, namely a

picture-naming task (identical to the one performed in the pretest) and a word-reading task.

Both tasks will be discussed separately.

4.3.3.1 PICTURE-NAMING TASK

4.3.3.1.1 ASPIRATION

Three things became apparent through the analysis of the results of the posttest picture-naming

task: (1) the effect of PoA shows itself in the mean results, (2) the informants produced many

tokens with aspiration, i.e. VOTs were target-like on many occasions and (3) the results obtained

for those tokens which were trained on during the session did not differ significantly from new

words.

Page 47: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

47

First, the mean VOTs presented in Graph 347 show gradually increasing VOTs starting from

/p/ going to /t/ and ending at /k/, for seven of the thirteen participants. One of the remaining

six who does not show this evolution is P4, who diverges the most obviously. In the alveolar

plosive /t/, P4 consistently produced VOTs of over 120 ms. As a consequence, the mean

calculated for all informants together shows p < k < t. If P4 is taken out of the equation, the

average VOT for /t/ is 83,0 ms (instead of 87,6 ms). This is 18,9 ms longer than the average

reported for /p/ (64,1 ms) and 1,9 ms shorter than the mean VOT for /k/ (84,9 ms), i.e. VOT p <

t < k.

What can also be retained from looking at the results reported in Graph 3 is that many of

the VOT durations are target-like. The averages for each of the voiceless stops will be discussed

individually.

Starting with /p/, the lowest average duration of 14,8 ms was recorded while the highest

average was 120,1 ms. P2 provided an individual minimum VOT value of 5,2 ms produced in the

onset of the word <pea>. The target-token <pool> produced by P10 contained the individual

maximum VOT of 209,8 ms. The 130 tokens consisted of 14 tokens which were not produced or

which were named differently than was intended. Of the remaining 116 correctly named tokens,

66 tokens were produced with target-like VOTs, i.e. 54,15 ms or more. This means that 56,9% of

all tokens with /p/ in the onset were produced with aspiration.

The average production of VOT in /t/ ranged from 37,5 ms to 143,2 ms. Of all the words

with /t/ in the onset, P2 produced the lowest VOT of 14,9 ms in the word <two>. P4, though,

produced a VOT of 190,6 ms in the word <tea>. Twenty-three out of a possible 130 tokens were

not uttered or were named in a different way than the researcher had intended. The 107

correctly named target-tokens contained 79 tokens which were produced with a VOT of 66,49

ms or more. In other words, 79 tokens – which is 73,8% – were produced with a target-like VOT,

i.e. with aspiration.

The velar voiceless plosive /k/ gave rise to averages for all participants between 49,4 ms

and 135,5 ms. The informants which produced the individual lowest and highest VOTs are P10

and P8, with a VOT of 22,3 ms in <cat> and 154,9 ms in <curl>, respectively. Only 6 of the 130

tokens starting with /k/ were not included in the calculations because they were either not

produced or they were not named as was intended. The remaining 124 tokens consisted of 82

tokens which were pronounced with aspiration, i.e. with a VOT of 70,02 ms or more. This is

66,1%.

In all three voiceless plosives /p/, /t/ and /k/, the maximum and minimum values

described above, are at two ends of the extreme. The largest difference is to be noticed in the

47 For the exact numbers, see Appendix F1.

Page 48: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

48

0,0

20,0

40,0

60,0

80,0

100,0

120,0

140,0

160,0

Po

siti

ve

VO

T (

ms)

Informants

/p/

/t/

/k/

alveolar plosive /t/, i.e. 175,7 ms difference between the shortest and longest VOT (14,9 ms as

opposed to 190,6 ms). Nevertheless, of all 390 tokens taken together from all plosives, 43 tokens

had to be excluded because they were not uttered or because the informants gave them a

different name than was intended by the researcher. Of the remaining 347, 227 tokens or else

65,4% were produced with aspiration.

Graph 3 Average VOT results for voiceless plosives in the posttest picture-naming task

The six words, two for each of the plosives, which were specifically trained on during the

session (i.e. pig, pen, tent, tea, cat and key) did not show significant difference in length of VOT

with the ones which were not trained on. However, the highest individual VOTs could often be

found in the words practiced on in the training session. This was not the case for /p/, but it was

for the other two voiceless plosives. P2, P4 and P13 produced the longest VOTs for /t/ in the

word <tea> and P6 in the target-token <tent>. For /k/, P3, P4, P7, P9, P10 and P12 all produced

the longest VOTs in the word <key>.

Table 7 presents an overview of the number of times (out of a possible 10), each participant

uttered each of the plosives with aspiration. Of all thirteen informants, P8 produced aspiration

the most consistently. She produced target-like VOTs 97% of the time. This stands in stark

contrast to P6 who applied the process of aspiration in the production of initial voiceless

plosives only 10% of the time. On average, the VOTs of the voiceless plosives were produced

target-like 5,7 times out of 10, i.e. more than half of the time. The difference between plosives is

Page 49: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

49

not significant, with a difference of only 0,9 times in favour of /t/ compared to /p/ (5,2 times for

/p/ and 6,1 times for /t/). The difference between /t/ and /k/ is a negligible (0,2 times).

Table 7 Number of times target-like VOT per informant per plosive

The results provided in Appendix F1 also show that the highest number of times a word was

produced with a target-like VOT was to be found in one of the words that was trained on. For

/p/, the only word which was produced 9 out of 13 times with a target-like VOT was <pen>. The

word <tea> was produced with a target-like VOT 10 out of 13 times. Eleven out of 13 times, the

word <key> was uttered with a target-like VOT. All three aforementioned words, i.e. pen, tea and

key, were those which were practiced on last in the training session.

All in all, 65,4% of all pronounced tokens were characterized by the production of

aspiration. The words which were used and practiced on during the training session showed

mildly better results than the so-called new words, both in frequency and in duration of VOT.

4.3.3.1.2 PREVOICING

The results of the posttest picture-naming task show that – similar to the pretest – the expected

effects of gender, PoA and the height of the following vowel do not manifest themselves. The

analysis of the results48 of the production of the voiced plosives illustrated that more than half of

the tokens were produced with a target-like VOT of 0 ms. Both plosives are discussed separately.

48 The mean results of the picture-naming posttest on prevoicing are presented in Graph 4. The exact numbers of the test can be found in Appendix F1.

Page 50: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

50

0,0

10,0

20,0

30,0

40,0

50,0

60,0

70,0

80,0

90,0

Ne

ga

tiv

eV

OT

(m

s)

Informants

/b/

/d/

The longest reported average VOT value for /b/ is -69,2 ms, while the shortest (besides

from 0 ms) is only -5,8 ms. On an individual level, P10 showed the maximum duration of VOT in

the word <bar> (-190,9 ms). The word <bird> made P8 prevoice for only -8,9 ms. Of the 130

tokens with /b/ in the onset, only 2 were named in another way than was intended or were not

produced at all. The 67 times out of a possible 128 in which a VOT of 0 ms was measured, means

that 52,3 % of all tokens with beginning /b/ were not produced with prevoicing.

In the voiced alveolar plosive /d/, the average VOTs lie between 0 ms and -80,0 ms. P5

however, produced a VOT of -151,9 ms in the token <dead>. Besides from quite a few target-like

VOTs of 0 ms, the shortest negative VOT was produced by P1 with in <dance> (-9,7 ms). For /d/,

out of 130 tokens, 9 were named incorrectly or not at all. In 62 tokens, out of a possible 121, a

VOT of 0 ms was recorded, which is 51,2%.

All tokens (260) of each plosive produced by all participants taken into account, 129 tokens

(besides the 11 which were not named correctly or were not produced at all) had a target-like

VOT. In other words, 51,8% of all 249 tokens were produced without prevoicing.

Graph 4 Average VOT results for voiced plosives in the posttest picture-naming task

Page 51: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

51

Four specific words, two for each of the plosives (i.e. ball, bus, dog and dad), were given as

practice words in the training session. These words did not render significantly shorter VOTs –

besides from the cases in which the VOT was 0 ms – than those which were considered as new

words.

Table 8 shows the number of times (out of 10) each participant uttered each of the two

plosives with a target-like VOT. The overview illustrates that P11 performed the picture-naming

posttest on prevoicing the worst, with only 10% target-like VOTs. P3 pronounced 90% of all

tokens without prevoicing, i.e. an increase of 80% compared to P11, and thus performed the

best.

Similar to the pretest, only a minor difference between PoA, i.e. between the bilabial and the

alveolar stops can be seen in the posttest. Tokens with /b/ in the onset (5,2 times) were

produced on average only 0,4 times more often than in /d/ (4,8 times). Also, no contrast

between male or female speakers was found in terms of frequency of prevoicing. Neither was

the height of the following vowel was also not an important factor in the production of

prevoicing.

Table 8 Number of times target-like VOT per informant per plosive

With regards to the trained words, i.e. bus, ball, dog and dad, no drastic difference was found

between those words which were trained and those which were un-trained. However, two of the

trained words, one for each of the voiced stops, was produced without prevoicing the most times

of all tokens. In other words, <bus> gave rise to 9 times 0 ms and so did <dad>.

To sum up, more than half of all uttered tokens were produced without prevoicing. This

means that the informants were less inclined to prevoice than was expected (through the

Page 52: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

52

influence of their native language). The words which were trained on showed a slight advantage

in frequency of target-like production.

4.3.3.2 WORD READING TASK

The final task which the informants were asked to perform, was a word-reading task. It had been

suggested that the visual clues of orthographically spelled words would simplify the process of

knowing when to aspirate or when not to produce prevoicing. The results for voiceless and

voiced plosives will be discussed one by one.

4.3.3.2.1 ASPIRATION

The word-reading task again showed that VOTs for /p/ are shorter than the ones for /t/, which

are in turn shorter than the VOTs for /k/. Six out of the 13 participants showed this relationship

between the voiceless plosives, and the mean for all participants together confirms it again. The

results49 of the final task also show that the participants produce a great number of target-like

VOTs, i.e. they produced aspiration a lot of the times, and one participant even consistently did

so. Furthermore, the words which the informants trained on did not render target-like VOTs

more frequently than the other words but they did show longer VOTs than the untrained words.

Each plosive will be discussed individually.

For the bilabial plosive /p/, average VOTs produced by all participants ranged from 23,3 ms

to 143,5 ms. The shortest and longest individual VOTs can both be found in the token <pig>

produced by P10 (13,1 ms) and by P6 (156,3 ms), respectively. Of the 39 tokens, 28 tokens were

produced with a target-like VOT for /p/. This is 71,8% of all tokens with /p/ in the onset.

The average maximum and minimum VOT values for /t/ are measured at 35,5 ms and 142,2

ms, respectively. The individual shortest VOT (of 20,7 ms) manifested itself in <tie> produced by

P12. A very long VOT was pronounced by P6 in the word <tent>, namely of 224,2 ms. One of the

39 tokens was produced incorrectly50. Of the other 38 words, 27 had a target-like VOT, which is

71,7% of all tokens.

Mean VOTs for /k/ can be found between 42,9 ms and 150,0 ms. Of all tokens with /k/ in

the onset, P4 produced the minimum VOT value (37,7 ms in the target-token <corn>), while P8

provided the maximum of 170,5 ms (in the word <coast>). The 39 tokens gave 27 tokens which

were pronounced with a VOT which was target like for the velar plosive /k/. This means that

69,2% of tokens with /k/ in the onset were aspirated.

49 The mean results of the word-reading posttest are presented in Graph 5. For the exact numbers of the test, see Appendix F2. 50 P3 said */θent/ instead of [thent].

Page 53: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

53

0,0

20,0

40,0

60,0

80,0

100,0

120,0

140,0

160,0

Po

siti

ve

VO

T (

ms)

Informants

/p/

/t/

/k/

On average, taking into account all participants and all three plosives, 70,7% of all tokens, or

else 82 out of 116 tokens, were produced with aspiration.

Graph 5 Average VOT results for voiceless plosives in the posttest word-reading task

No significant difference in VOT duration can be found between trained and untrained

words. Of the three tokens starting with /p/, only P6 produced the longest VOT in <pig>, i.e. the

word with /p/ in the onset which was trained on during the practical part of the training

session. Of the three tokens with /t/ in the onset, 7 participants (P1, P2, P4, P6, P7, P11 and P11)

produced the longest VOT in the trained word, i.e. <tent>. P4 produced a VOT of 218,0 ms in

<tent>, which is a striking 181,7 ms longer than the second highest VOT value (in <tie> with

105,8 ms). Five informants produced the longest VOTs in the trained word with /k/ in the onset

(<cat> produced by P1, P3, P6, P7 and P10). The VOT in <cat> pronounced by P10 is 37,5 ms

longer than the second highest VOT (i.e. 84,8 ms in <corn>).

Page 54: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

54

Table 9 Number of times target-like VOT per informant per plosive

The number of times (out of a possible 3) – presented separately for each participant – each

uttered plosive was produced with a target-like VOT are provided in Table 9. The Table

illustrates that P9 performed the worst, with only 10% target-like VOTs. P1 and P8, with 100%

of tokens pronounced target-like, had the best results. On average, the participants produced 2

out of the 3 target-tokens with target-like VOTs, or else 66,7%.

The words which were practiced on during the training session were not produced more

often with target-like VOTs than the other words which did not receive any attention whilst

training.

In short, the word-reading task gave rise to a great deal of target-like pronounced voiceless

stops, in some cases even 100% and on average an impressive 70,7 ms. The trained words were

not produced with aspiration more often than the others, but they did create longer VOTs.

4.3.3.2.2 PREVOICING

The posttest word-reading task on prevoicing showed remarkable results51. Three of the

thirteen informants did not produce prevoicing in any of the tested cases. The results also

showed that the trained words were not particularly more influenced by the training than the

other target-tokens which were tested on in the word-reading task. Each plosive is discussed

individually.

51 Graph 6 represents the mean results obtained in the word-reading task on the production of prevoicing. The exact numbers for this test can be found in Appendix F2.

Page 55: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

55

0,0

20,0

40,0

60,0

80,0

100,0

120,0

140,0

160,0

Ne

ga

tiv

eV

OT

(m

s)

Informants

/b/

/d/

Mean values for /b/ range from 0 ms to -159,4ms. P10, however, produced the longest

negative VOT in the token <bus> with -198,3 ms. In 19 of the 39 cases, VOT of 0 ms was

recorded, which is 48,7%. P2 clearly found it harder not to produce prevoicing in /b/ (-124,0

ms) than in /d/ (-35,5 ms) since the difference in average VOT between both plosives in her case

is 88,5 ms. This is an individual instance in which the effect of PoA on prevoicing can be

ascertained. The tokens with voiced stops in the onset produced by P2 were influenced by the

PoA, i.e. /b/ is produced with a longer negative VOT than /d/. However, this individual case

cannot lead us to make any general assumptions on this topic.

The recorded mean VOTs for /d/ in the word-reading task range from 0 ms to -123,3 ms.

The word <dunk> rendered the longest negative VOT, i.e. -153,0 ms, uttered by P5. One of the 39

target-tokens for /d/ was pronounced incorrectly52. The other 38 tokens contained 19 tokens

which had a VOT of 0 ms. This means that in 50% of all cases, no prevoicing was produced.

If all the tokens uttered by all thirteen participants are taken into account, 38 tokens were

pronounced with a target-like VOT of 0 ms, which is roughly half of the times (49,4%).

Graph 6 Average VOT results for voiced plosives

in the posttest word-reading task

52 Surprisingly, P12 replaced initial /d/ in <dunk> by /θ/.

Page 56: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

56

The words which were specifically concentrated on during the training session, i.e. <bus>

and <dad>, did not give rise to significantly shorter VOTs than the other so-called new words,

except in the case of P2.

Table 10 Number of times target-like VOT per informant per plosive

The number of times (out of a possible 3), each of the plosives was produced with a target-

like VOT by each of the participants are presented in Table 10. This Table shows that on the one

hand, P5, P10 and P11 performed 0% target-like VOTs and on the other hand that P1, P4 & P7

produced 100% of the tokens without prevoicing. These two extremes, i.e. from 0% to 100%

target-like production, clearly illustrate the tendency of some speakers to prevoice and others

not to. The table furthermore also shows that /b/ and /d/ were both produced with a VOT of 0

ms in 1,5 of the 3 times, i.e. half of the times. In other words the frequency with which the

participants produced prevoicing was not influenced by the PoA, since both plosives gave rise to

the same amount of target-like VOTs.

The trained word <bus> was produced 6 times without prevoicing, which is an equal

number of times or even less than the other two words (<bed> also 6 times and <boat> 7 times).

The token with /d/ in the onset which was trained on, i.e. <dad>, was produced without

prevoicing 9 out of 13 times. This is 3 times more than for <dust> and 5 times more than for

<dunk>. In other words, <dad> seemed to be a word which the informants found easier to

produce without prevoicing. This could be attributed to the fact that the token <dad> was

practiced on during the one-on-one training session.

Page 57: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

57

To summarize, the words which were specifically trained on did not render significantly

different results from the ones which were not practiced during the session. Nonetheless, the

informants omitted the production of prevoicing approximately half of all times.

4.3.4 PRETEST VS. POSTTEST: A COMPARISON

In this section, the results of the pretest and those of the posttest will be compared53. By

confronting these results, we will be able to see whether or not the informants improved from

the pre-training to the post-training tests. In the following sections, the processes of aspiration

and prevoicing will be discussed separately.

4.3.4.1 ASPIRATION

The differences in VOT duration between the pre- and posttest will be discussed first. Looking at

the results obtained from both picture-naming tasks, presented in Graph 7, it becomes clear that

all participants except one (P1)54 on average produced longer VOTs for all plosives in the

posttest than in the pretest. Eight out of the thirteen informants produced even longer VOTs in

the posttest word-reading task. On average, an increase of VOT duration is visible from the

pretest over the posttest picture-naming task, to the posttest word-reading task. P6 showed the

greatest improvement among all participants, in the word-reading task. There, she made her

mean VOTs on average 87,2 ms longer than in the pretest and 75,2 ms longer than in the picture-

naming posttest. P8 also made a considerable improvement in VOT length, from 39,7 ms in the

pretest to 125,5 ms and 135,7 ms in the picture-naming and word-reading posttests,

respectively. In the posttest word-reading task, P6 corrected herself after having said <cat> with

a VOT of approximately 23 ms. When she produced the word a second time, she added

aspiration (VOT of 135,7 ms). Self-correction did not always lead to an improvement in VOT

production. P10 uttered the target-token <tongue> a first time in the posttest but corrected

himself because he knew he had not pronounced it correctly, i.e. he had not produced aspiration.

The second time, the VOT was only a little longer but unfortunately it was still not target-like.

It became clear during the training that P12 experienced difficulty in producing aspiration,

especially in initial /p/ and /t/. While practicing, the instructor noted that he often uttered /θ/

instead of [th]. He was corrected by the instructor, who explained that this was not the aim. By

the end of the training session, P12 had improved on his pronunciation of all three plosives.

Unfortunately, in the posttest, he produced quite a lot of /θ/ where he should have aspirated.

53 For a list of the results of the pre- and posttest picture-naming task, in order in which the tests were conducted, see Appendix G. 54 The difference however is not significant (only 2,8 ms).

Page 58: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

58

0,0

20,0

40,0

60,0

80,0

100,0

120,0

140,0

Po

siti

ve

VO

T (

ms)

Informants

Pretest

Posttest Picture-naming

Posttest Word-reading

Once, he even produced /θ/ in a word with /d/ in the onset (namely <dunk> in the word-

reading task). The only other participant who confused aspiration for /θ/ was P3, but only in a

single production (<tent> in the word-reading task).

Graph 7 Comparison between mean VOTs produced in voiceless stops in the pre- and possttest

Secondly, the mean number of times a target-like VOT was produced, also increased from

the pre- to the posttest. On average, each of the plosives was produced two times more with a

target-like VOT in the posttest picture-naming task than in the pretest. The informant who

increased the number of times the most was P13, with 3,2 times more in the posttest compared

to the pretest.

The words which the participants were specifically trained on did not show particularly

greater improvements than those which did not receive any special attention during the training

session. The participants also improved their VOTs in other words. Even some cases when a

different word than the target-word was uttered – also with a voiceless stop in the onset –

improvement could be noticed. For example, P3 uttered <cake> instead of <pie> in both pre- and

posttest. The second time the word <cake> was pronounced, the informant produced a VOT of

101,3 ms, which is an increase of 50 ms from the VOT of 51,3 ms produced in the pretest. In

other words, P3 produced aspiration in the word <cake> in the posttest. Another example can be

found in P8 who said <coconut> instead of <palm> in both picture-naming tasks. The first time,

Page 59: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

59

the word was uttered with a VOT of 35,6 ms, i.e. not target-like. In the posttest, P8 produced a

clearly target-like VOT of 123,1 ms. These two cases illustrate that improvement in VOT

duration was not limited to only those words which the participants practiced on during the

training session. The informants generalized the information received in the pronunciation

training session to new words with /p/, /t/ or /k/ in the onset.

4.3.4.2 PREVOICING

With regards to the process of prevoicing, some participants changed their pronunciation

drastically. Three participants (P1, P4 and P7) performed the posttest word-reading task

perfectly, i.e. they did not produce prevoicing in any of the target-tokens with a voiced plosive in

the onset. All but two participants (P10 and P11) produced shorter VOTs in the posttest picture-

naming task compared to the pretest. Nine informants produced shorter VOTs in the posttest

word-reading task compared to the pretest. Not all informants however produced shorter VOTs

in the word-reading task than in the posttest picture-naming test. For seven of them (P2, P3, P5,

P10, P12 and P13) the word-reading task rendered longer VOTs than the picture-naming task in

the posttest. Looking only at the mean results presented in Graph 8, an improvement can be

found from the pretest to both posttests.

P1 was the only participant who prevoiced consistently in both /b/ and /d/ and produced

the longest mean VOTs in the pretest. He went on to omitting the production of prevoicing for

100% in the word-reading posttest. Surprisingly, P11 did worse in the posttest picture-naming

task on prevoicing than in the pretest. In the posttest she produced only 10% of all target-tokens

with a target-like VOT of 0 ms while in the pretest this was 55%.

Page 60: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

60

0,0

20,0

40,0

60,0

80,0

100,0

120,0

140,0

160,0

Ne

ga

tiv

eV

OT

(m

s)

Informants

Pretest

Posttest Picture-naming

Posttest Word-reading

Graph 8 Comparison between mean VOTs produced in voiced stops in the pre- and possttest

Similar as in the case of aspiration, self-correction did not always lead to better results. P12,

for example, corrected himself a few times in the word <dad> but still did not manage to

produce the token without prevoicing. In other instances when P12 corrected himself – because

he knew he had still produced prevoicing when he should not have – he only succeeded a few

times (e.g. the VOT of the initial /b/ in <bus> went from -76,3 ms to 0 ms in the word-reading

task). Surprisingly, in the pretest, P11 said <dæns> without prevoicing but when she corrected

herself and said <da:ns>, she produced prevoicing (VOT of -65,6 ms). A possible explanation

could be that, since the second utterance was a correction of the first, she hyper-articulated in

order to make sure that she pronounced the word in the right way the second time around.

The words which the participants were trained on did not render significantly lower VOTs

than those which were not included in the training session. However, the greatest number of

times each plosive was produced without prevoicing is to be found in the practiced words (i.e.

<bus> and <dad>). On average, tokens were produced without prevoicing 2,4 times more in the

posttest picture-naming task than in the pretest. P3 produced the most tokens with a target-like

VOT (9 times) but P1 showed the greatest improvement, i.e. from 0 times in the pretest to 7,5

times in the word-reading posttest. Strikingly, P11 produced prevoicing 4,5 times less in the

posttest compared to the pretest (from 5,5 times in the pretest to only 1 time in the posttest).

Page 61: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

61

4.3.4.3 GENERAL DISCUSSION

Table 11 provides a clear overview of the average VOT durations for all plosives. The table

shows that on average, the informants in voiceless plosives produced VOTs of approximately 23

ms longer from pretest to posttest. For the voiced plosives /b/ and /d/ this is about 26 ms.

Table 11 Comparison between VOTs in pretest and posttest

The table also illustrates that the VOTs of voiceless plosives become continuously longer

moving from the pretest over the posttest picture-naming task to the posttest word-reading

task. For the voiced plosives, this is not the case. While there is definitely improvement from the

pretest to the posttest in general, the word-reading task did not render VOTs closer to the

target-like VOT of 0 ms than the picture-naming posttest.

Those participants who had claimed to have received pronunciation training before taking

part in the experiment, did not produce more target-like VOTs in the pretest. The informants

who stated to have heard of the processes of aspiration and prevoicing did also not perform

better55 than those who did not.

It could be noted that those who had expressed to possess a relatively good pronunciation

of English performed better in both tests than those who were very uncertain in naming the

words. Some did not only express that they did not possess a good pronunciation of English, bit

also that they were not that good at English in general. For those informants (P5 and P6), the

lower level of proficiency impeded them to pay extra attention to their production of either of

the features. However, P6 did show considerable improvement in the production of both

aspiration and prevoicing.

Overall, an 18,1% and 23,9% increase of target-like VOTs from pre- to posttest for

aspiration and prevoicing, respectively, proves that even a single short session has an impact on

speakers’ pronunciation of voiced and voiceless plosives.

55 Not even those who were able to provide a correct example in the questionnaire, i.e. [khæt] or [thent].

Page 62: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

62

4.3.5 SUMMARY

Contrary to what was hypothesized, before taking part in the training session, the participants

on average already produced a great number of target-like VOTs in voiceless plosives. They did

however produce even more target-like VOTs after training. All tests confirmed that PoA

influences VOT, with velar /k/ rendering the longest, alveolar /t/ the intermediate and bilabial

/p/ the shortest VOTs. Since the informants produced even longer VOTs in the posttest word-

reading task, it can be suggested that orthographic clues help learners ascertain when to

produce aspiration.

Prevoicing proved to be a process which certain speakers have a greater tendency of

producing than others, as was suggested in the literature (see Section 2.3). In other words, some

participants did not prevoice in some instances already in the pretest, some improved after

training and some did not succeed as well as others in omitting the production of prevoicing in

English. The informants did not prevoice as heavily or as frequently as was anticipated. The

foreseen difference in PoA and between genders was not significant. On average, there was no

greater improvement to be found in the word-reading task, which leads us to suggest that the

orthographic forms of words do not aid speakers in their production of prevoicing.

In short, the pronunciation training proved to be successful for both the process of

aspiration as for prevoicing.

Page 63: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

63

5. CONCLUSION

Native speakers of Belgian Dutch are influenced in the pronunciation of English by their L1.

Since the majority of the participants in the current study had not received any particular

training in pronunciation, interference from their L1 was definitely expected to occur before

they took part in a specially designed training session on aspiration and prevoicing.

The general aim of the study was to find out whether or not a single training session could

have a positive effect on native speakers of Belgian Dutch’s production of aspiration and

prevoicing. The resulst showed a clear improvement from the pretest to the posttests. In other

words, the pronunciation training session was definitely effective. Since a single training session

can improve a speakers’ pronunciation, this implies that if more attention was to be paid to

pronunciation training in classes (at secondary school and/or at university or college), more

non-native speakers of English might possess a more native-like pronunciation. It must be

recognized though that studying as few participants as was done here, it is impossible to make a

generalizing conclusion on this topic. A larger group of informants trained and tested over a

longer period of time could render a more general conclusion on the effect training has on the

pronunciation of English by non-native speakers.

A possible explanation for the overall positive results, could be the training methods which

were used during the session. It could be that the combination of perception, production and the

use of real-time spectrograms provided the informants with sufficient information to

understand the phenomena well enough to not only apply them in the words which were

specifically practiced but also in new words.

Maybe more training sessions could render even greater improvements, since a clearly

positive effect of the pronunciation training can already be detected after only one session. The

informants who did not improve significantly would probably benefit the most from more

training sessions. However, I believe that their minor improvement is more due to the fact that

they do not possess the language well enough to pay extra attention to pronunciation features as

specific as aspiration and prevoicing. These participants showed that it might be more beneficial

to them to train the correct pronunciation through orthographically spelled words, since they

showed higher VOTs and more often target-like VOTs in the word-reading task than in the

picture-naming tasks.

A possible suggestion for further research could be to use the combination of pronunciation

training techniques on other features of English. Combining the widely researched methods, i.e.

perception training, production training, audiovisual training and the use of real-time

spectrograms, and not one or the other might render the best results when training a non-native

speaker in pronunciation.

Page 64: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

64

REFERENCES

van Alphen, Petra M. & Smits, Roel. 2004. Acoustical and perceptual analysis of the voicing

distinction in Dutch initial plosives: the role of prevoicing. Journal of Phonetics, 32, 455-491.

Boersma, Paul & Weenink, David. 2010. Praat: doing phonetics by computer (Version 5.1.29).

[Software]. Retrieved March 14, 2010, from http://www.praat.org/. Bradlow, Ann R., Pisoni, David B., Akahane-Yamada Reiko & Tohkura Yoh’ichi. 1997. Training

Japanese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning on speech production. Journal of the acoustical society of America, 101(4), 2299-2310.

Cho, Taehong, Ladefoged, Peter. 1999. Variation and universals in VOT: evidence from 18

languages. Journal of Phonetics, 27, 207-229. Collins, Beverley & Inger M. Mees. 2008. Practical phonetics and phonology. London:

Routledge, 83-84.

Collins, Beverley & Vandenbergen, Anne-Marie. 2000. Modern English pronunciation. A practical guide for speakers of Dutch. Gent: Academia Press.

Docherty, Gerard J. 1992. The timing of voicing in British English Obstruents. Berlin, New York:

Foris Publications, 25 & 116. Flege, James Emil. 1989. Chinese subjects’ perception of the word-final English /t/-/d/ contrast:

Performance before and after training. Journal of the acoustical society of America, 86(5), -1684-1697.

Hattori, Kota. 2009. Perception and Production of English /r/-/l/ by Adult Japanese Speakers.

Unpublished Doctoral Dissertation, University College London. Hattori, Kota & Iverson, Paul.2008. English /r/-/l/ pronunciation training for Japanese speakers.

Journal of the acoustical society of America, 123, p. 3327. Hazan, Valerie, Sennema, Anke, Iba, Midori & Faulkner, Andrew. 2005. Effect of audiovisual

training on the perception and production of consonants by Japanese learners of English. Speech Communication, 47(3), 360-378.

Huckvale, Mark. 2010. SFS/RTGram (Version 1.3). [Software]. Retrieved April 12, 2011, from

http://www.phon.ucl.ac.uk/resourse/sfs/rtgram. Kendrick, Helen. 1997. Keep them talking! A project for improving students’ L2 pronunciation.

System, 25(4), 545-560. Kessinger, Rachel H. & Blumstein, Sheila E. 1998. Effects of speaking rate on voice-onset time

and vowel production: Some implications for perception studies. Journal of Phonetics, 26, 117-128.

Lisker, Leigh & Abramson, Arthur S. 1964. A cross-language study of voicing in initial stops:

acoustical measurements. Word (reprinted from), 20(3), 384-422.

Page 65: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

65

Liu, Hanjun, Ng, Manwa L, Wan, Mingxi, Wang, Supin & Zhang, Yi. 2007. Effects of Place of Articulation and Aspiration on Voice Onset Time in Mandarin Esophageal Speech. Folia Phoniatrica et Logopaedica, 59, 147–154.

Magloire, Joël & Green, Kerry P. 1999. A Cross-Language Comparison of Speaking Rate Effects on

the Production of Voice Onset Time in English and Spanish. Phonetica, 56, 158-185. Mildner, Vesna & Tomić, Diana. 2007. Effects of phonetic speech training on the pronunciation of

vowels in a foreign language. In: Trouvain, J, Barry, W.J. (eds.), Proceedings of the 16th International Congress of Phonetic Sciences, Saarbruecken, pp. 1665-1668.

Simon, Ellen. 2010. Voicing in Contrast. Acquiring a Second Language Laryngeal System. Gent:

Academia Press. Spencer, Andrew. 1996. Phonology. Theory and Description. Cambridge MA: Blackwell,

206-212. Vanlocke, Janey. 2010. The phonological representations of cognates vs. noncognates in second

language learners. The production of aspiration of voiceless stops in English-Dutch cognates by native speakers of Dutch. Unpublished Bachelor Research Paper, Ghent University.

Whalen, D. H., Levitt, Andrea G. & Goldstein, Louis M. 2007. VOT in the babbling of French- and

English-learning infants. Journal of Phonetics, 35, 341-352.

Page 66: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

66

APPENDICES

APPENDIX A: QUESTIONNAIRE

Q U E S T I O N N A I R E

Informant No. ...... Age: …… Gender: male/female Language background

- Native language: ……………………………………………………………………………………………………………………. - Native language of your parents:

o Mother: ……………………………………………………………………………………………………………………… o Father: ……………………………………………………………………………………………………………………….

- Language(s) used on a daily basis: ..………………………………………………………………………………………… - Besides Dutch, have you studied any other languages? Yes/No

o Which one(s)? .…………………………………………………………………………………………………………… - Have you had any contact with native speakers of English? (e.g. friends, relatives, colleagues,

fellow students, etc.) Yes/No ………………………………………………………………………………………………... o How? Spoken/written/both

- Have you spent any time in English-speaking countries? Yes/No o What is the longest period you have ever spent in an English-speaking country?

…………………………………………………………………………………………………………………………………… o In what context? (e.g. work, vacation, foreign exchange student programme, etc.)

…………………………………………………………………………………………………………………………………… Knowledge of English

- Did you take an English course at secondary school? Yes/No (In case your answer is no, skip this question and go ahead to the next one)

o How many years? ………………………………………………………………………………………………………. o Did you get any specific training in pronunciation? Yes/No

Which skills were trained during the course? (e.g. vowels, the difference between that and think, the difference between bet and bed, etc.) ………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………...........

How were these skills trained? (e.g. reading aloud, repetition task, etc.) ………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………..........

- Did you take an English course at college or university? Yes/No (In case your answer is no, skip this question and go ahead to the next one)

o How many years? ………………………………………………………………………………………………………. o Did you get any specific training in pronunciation? Yes/No

Which skills were trained during the course? (e.g. vowels, the difference between that and think, the difference between bet and bed, etc.) ………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………...........

How were these skills trained? (e.g. reading aloud, repetition task, etc.) ………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………..........

Page 67: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

67

- Did you take an adult education English course? Yes/No (In case your answer is no, skip this question and go ahead to the next one)

o Why? (e.g. to improve your knowledge of English, to improve your pronunciation, to train in speaking English, etc.) ………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………....................................... ...............

o How many years? ………………………………………………………………………………………………………. o Did you get any specific training in pronunciation? Yes/No

Which skills were trained during the course? (e.g. vowels, the difference between that and think, the difference between bet and bed, etc.) ………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………..........

How were these skills trained? (e.g. reading aloud, repetition task, etc.)? ………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………..........

- Did you get (part of) your knowledge of English via the media? Yes/No (Please mark which ones)

o Television o Radio o Newspaper o Magazine o Internet o Other: …………………………………………………………………………………………………………………………

Self-evaluation of pronunciation of English

- How would you rate your pronunciation of English? (Please mark one of the possibilities)

o Horrible o Very bad o Bad o Okay o Good o Very good o Perfect

- Why did you rate yourself in that way? (e.g. because people do not always understand what you are saying, because you do not experience any problems with comprehensibility but your pronunciation clearly gives away that you are not a native speaker, because your pronunciation is native-like, etc.) ………………………………………………………………………………………………………………………………………………..………………………………………………………………………………………………………………………………………………..………………………………………………………………………………………………………………………………………………............

- Would you like to improve your pronunciation of English? Yes/No (Please give a reason why in any case)

o Why? (e.g. to improve comprehensibility, to achieve native-like pronunciation, you do not think it is necessary, etc.) ………………………………………………………………………………………………………………………………………………..………………………………………………………………………………………………………………………………………………..………………………………………………………………………………………………………….......

o Which feature(s) would you like to improve? (e.g. vowels, the contrast between that and

think, the difference between bet and bed, etc.) (If your answer to the previous question was no, skip this one and go ahead to the next one) ………………………………………………………………………………………………………………………………………………..………………………………………………………………………………………………………………………………………………..………………………………………………………………………………………………………….......

Page 68: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

68

Some final questions

- Have you heard of the process of aspiration? Yes/No o Could you give an example? ……………………………………………………………………………………..

- Have you heard of the process of prevoicing? Yes/No o Could you give an example? ………………………………………………………………………………………

T H A N K Y O U F O R Y O U R P A R T I C I P A T I O N !

Page 69: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

69

APPENDIX B: PRETEST AND POSTTEST

1. LIST OF TOKENS USED IN PRE- AND POSTTEST PICTURE-NAMING TASK

ASPIRATION

/P/ /T/ /K/

1. palm talk call

2. pan tall car

3. pea tape card

4. pear tea cat

5. pen tent cold

6. pie time cow

7. pig toe cup

8. pill tongue curl

9. pink toy key

10. pool two king

PREVOICING

/B/ /D/

1. back dad

2. ball dance

3. bar dark

4. bear day

5. beer dead

6. bike deer

7. bird dive

8. bomb dog

9. box door

10. bus duck

Page 70: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

70

DISTRACTORS

1. ambulance 16. moon

2. apple 17. nurse

3. chicken 18. old

4. ear 19. orange

5. eat 20. out

6. egg 21. rainbow

7. elephant 22. sheep

8. fish 23. spoon

9. flower 24. volcano

10. grass 25. world

11. hand

12. heart

13. lemon

14. light

15. milk

Note: all tokens were presented to the informants randomly, see Appendix B.2.

Page 71: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

71

2. SLIDES AS PRESENTED IN PRE- AND POSTTEST PICTURE-NAMING TASK

Page 72: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

72

Page 73: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

73

Page 74: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

74

Page 75: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

75

Page 76: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

76

3. LIST OF TOKENS US ED IN POSTTEST WORD-READING TASK

ASPIRATION

/P/ /T/ /K/

1. pay tent cat

2. pig text coast

3. pin tie corn

PREVOICING

/B/ /D/

1. bed dad

2. boat dunk

3. bus dust

DISTRACTORS

1. ankle 11. sheets

2. chair 12. sun

3. enter 13. swim

4. film 14. under

5. filter 15. walk

6. green

7. hope

8. instant

9. link

10. map

Notes:

- Tokens in bold had already occurred in the pre- and posttest picture-naming task, and were trained on during the training session.

- All tokens were presented to the informants randomly, see Appendix B.4.

Page 77: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

77

4. LIST OF WORDS AS PRESENTED IN POSTTEST WORD-READING TASK

Practice words

1. film

2. coast

3. under

4. pay

5. link

6. tent

7. ankle

8. cat

9. enter

10. bus

11. chair

12. hope

13. tie

14. swim

15. corn

16. walk

17. dad

18. sun

19. pig

20. filter

21. dust

22. sheets

23. pin

24. boat

25. instant

26. text

27. bed

28. green

29. dunk

30. map

Page 78: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

78

APPENDIX C: TRAINING SESSION

1. SLIDES USED IN TRAINING SESSION ON ASPIRATION AND PREVOICING

Page 79: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

79

Page 80: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

80

Page 81: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

81

2. HANDOUT WITH TIPS ON ASPIRATION AND PREVOICING

Aspiration Hoe testen op aspiratie? Houd een stukje papier voor je mond. Zeg nu de woorden <pig>, <pen>, <tent>, <tea>, <key> en <cat>. Als het papiertje merkbaar bewoog dan heb je aspiratie geproduceerd. Als het niet bewoog, probeer het opnieuw.

Hoe je productie van aspiratie te verbeteren?

- Voor /p/: o Ontspan je lippen o Verwijder de spanning die bij het Nederlands aanwezig is

- Voor /t/: o Ontspan je tong o Gebruik het puntje van je tong o Verwijder de spanning die bij het Nederlands aanwezig is

- Voor /k/: o Ontspan je tong o Verwijder de spanning die bij het Nederlands aanwezig is

Prevoicing Hoe testen op prevoicing?

Voel of je stembanden trillen wanneer je de woorden <bus>, <ball>, <dad> en <dog> uitspreekt. Als je je stembanden voelde trillen vóór je de /b/ of/d/ uitsprak dan heb je prevoicing geproduceerd. Probeer het opnieuw zonder je stembanden te laten trillen.

Hoe geen prevoicing te produceren in het Engels? Zorg voor een beetje meer spanning in je tong en in je spraakkanaal voor je de /b/ of /d/ produceert, zodanig dat je stembanden minder snel gaan trillen.

Page 82: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

82

APPENDIX D: COPY OF RECORDINGS

Page 83: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

83

APPENDIX E: RESULTS OF PRETEST

1. ASPIRATION

/P/

/T/

/K/

Page 84: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

84

2. PREVOICING

/B/

/D/

Notes:

- In case a word was not named correctly, the word that was uttered instead was added in the table (e.g. * swim).

- In case no word was uttered, the symbol / was added.

Page 85: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

85

APPENDIX F: RESULTS OF POSTTEST

1. PICTURE-NAMING TASK

ASPIRATION

/P/

/T/

Notes: - In case a word was not named correctly, the word that was uttered instead was added in

the table (e.g. * cake). - In case no word was uttered, the symbol / was added. - P12 often mistook aspiration (e.g. <tea>, [thi:]) for /θ/ (e.g. <tea>, *[θi:]). In that case, the

words in the table were spelled using <th> as /θ/.

Page 86: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

86

/K/

PREVOICING

/B/

/D/

Page 87: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

87

2. WORD-READING TASK

ASPIRATION

/P/ /T/

/K/

Page 88: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

88

PREVOICING

/B/ /D/

Notes:

- P3 mistook aspiration in [thent] for /θ/, i.e. *[θent]. In that case, the word in the table was spelled using <th> as /θ/.

- P12 mistook not producing prevoicing in <dunk> for /θ /, i.e. *[θunk]. In that case, the word in the table was spelled using <th> as /θ/.

Page 89: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

89

APPENDIX G: RESULTS OF PRE- AND POSTTEST PICTURE-NAMING TASK

Page 90: On the production of aspiration and prevoicinglib.ugent.be/fulltxt/RUG01/001/786/578/RUG01-001786578_2012_0001_AC.pdf · Academic Year 2010 - 2011 On the production of aspiration

90

Notes:

- In case a word was not named correctly, the word that was uttered instead was added in the table (e.g. * greet).

- In case no word was uttered, the symbol / was added. - P12 often mistook aspiration (e.g. <tea>, [thi:]) for /θ/ (e.g. <tea>, *[θi:]). In that case, the

words in the table were spelled using <th> as /θ/.