Journal of Phonetics - Hanyangsite.hanyang.ac.kr/documents/24916/113960/Cho_Lee_Kim... ·...

18
Communicatively driven versus prosodically driven hyper-articulation in Korean Taehong Cho a,n , Yoonjeong Lee a , Sahyang Kim b a Hanyang Phonetics and Psycholinguistics Lab, Department of English Language and Literature, Hanyang University, 17 Haengdang-dong, Seongdong-gu, Seoul, Republic of Korea b Department of English Education, Hongik University, Seoul, Republic of Korea article info Article history: Received 27 June 2010 Received in revised form 7 February 2011 Accepted 10 February 2011 Available online 2 May 2011 abstract This study investigated how three different kinds of hyper-articulation, one communicatively driven (in clear speech), and two prosodically driven (with boundary and prominence/focus), are acoustic- phonetically realized in Korean. Several important points emerged from the results obtained from an acoustic study with eight speakers of Seoul Korean. First, clear speech gave rise to global modification of the temporal and prosodic structures over the course of the utterance, showing slowing down of the utterance and more prosodic phrases. Second, although the three kinds of hyper-articulation were similar in some aspects, they also differed in many aspects, suggesting that different sources of hyper- articulation are encoded separately in speech production. Third, the three kinds of hyper-articulation interacted with each other; the communicatively driven hyper-articulation was prosodically modu- lated, such that in a clear speech mode not every segment was hyper-articulated to the same degree, but prosodically important landmarks (e.g., in IP-initial and/or focused conditions) were weighted more. Finally, Korean, a language without lexical stress and pitch accent, showed different hyper- articulation patterns compared to other, Indo-European languages such as Englishi.e., it showed more robust domain-initial strengthening effects (extended beyond the first initial segment), focus effects (extended to V1 and V2 of the entire bisyllabic test word) and no use of global F0 features in clear speech. Overall, the present study suggests that the communicatively driven and the prosodically driven hyper-articulations are intricately intertwined in ways that reflect not only interactions of principles of gestural economy and contrast enhancement, but also language-specific prosodic systems, which further modulate how the three kinds of hyper-articulations are phonetically expressed. & 2011 Elsevier Ltd. All rights reserved. 1. Introduction Speech production is by nature variable, so that no single utterance can be repeated with exactly the same physical properties. Some aspects of speech variability may be inevitably physiological or biomechanical, as no two speakers can have exactly the same vocal tract configurations nor can even the same speaker assume the exact same articulatory posture twice. However, speech variability may also arise in linguistically or communicatively relevant ways. One source of linguistically relevant speech variation can be found with commu- nicative conditions, as described by the H&H theory (Lindblom, 1990). For example, when speakers face communicatively adverse situations (e.g., speaking in noise, or talking to hearing-impaired or non-native listeners), they are likely to make efforts to produce the utterance more clearly (or hyper-articulate it) to enhance its intelligibility. On the other hand, when the communicative situation is optimal (e.g., speaking in a quiet environment or talking casually to a friend about mutually understood topics), speakers are likely to employ a low-cost form of motor behavior to produce the utterance (or hypo-articulate it). As a result of such communicatively driven adjustment of speech production, an utterance may be produced with a wide range of acoustic-phonetic variation along the hypo- to hyper-articulated speech continuum (see Smiljanic ´ & Bradlow, 2009 for a comprehen- sive review). Another source of linguistically relevant speech variation is prosodic structure. Spoken utterances are prosodically modified, depending on the prosodic structure with which the utterance is produced (see Cho, in press; Shattuck-Hufnagel & Turk, 1996 for reviews). The prosodic structure of an utterance is known to be expressed not only by suprasegmental features such as F0, duration, and amplitude, but also by segmental or articulatory features. For example, speech production can vary with degree of prominence, such that segments are articulated strongly in lexically stressed syllables, and the lexically stressed segments are articulated even more strongly when they receive higher-order prominence such as Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/phonetics Journal of Phonetics 0095-4470/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.wocn.2011.02.005 n Corresponding author. Tel.: + 82 2 2220 0746; fax: + 82 2 2220 0741. E-mail addresses: [email protected] (T. Cho), [email protected] (Y. Lee), [email protected] (S. Kim). Journal of Phonetics 39 (2011) 344–361

Transcript of Journal of Phonetics - Hanyangsite.hanyang.ac.kr/documents/24916/113960/Cho_Lee_Kim... ·...

  • Journal of Phonetics 39 (2011) 344–361

    Contents lists available at ScienceDirect

    Journal of Phonetics

    0095-44

    doi:10.1

    n Corr

    E-m

    yj.cynth

    journal homepage: www.elsevier.com/locate/phonetics

    Communicatively driven versus prosodically driven hyper-articulationin Korean

    Taehong Cho a,n, Yoonjeong Lee a, Sahyang Kim b

    a Hanyang Phonetics and Psycholinguistics Lab, Department of English Language and Literature, Hanyang University, 17 Haengdang-dong, Seongdong-gu, Seoul, Republic of Koreab Department of English Education, Hongik University, Seoul, Republic of Korea

    a r t i c l e i n f o

    Article history:

    Received 27 June 2010

    Received in revised form

    7 February 2011

    Accepted 10 February 2011Available online 2 May 2011

    70/$ - see front matter & 2011 Elsevier Ltd. A

    016/j.wocn.2011.02.005

    esponding author. Tel.: +82 2 2220 0746; fax

    ail addresses: [email protected] (T. Cho),

    [email protected] (Y. Lee), [email protected]

    a b s t r a c t

    This study investigated how three different kinds of hyper-articulation, one communicatively driven (in

    clear speech), and two prosodically driven (with boundary and prominence/focus), are acoustic-

    phonetically realized in Korean. Several important points emerged from the results obtained from an

    acoustic study with eight speakers of Seoul Korean. First, clear speech gave rise to global modification of

    the temporal and prosodic structures over the course of the utterance, showing slowing down of the

    utterance and more prosodic phrases. Second, although the three kinds of hyper-articulation were

    similar in some aspects, they also differed in many aspects, suggesting that different sources of hyper-

    articulation are encoded separately in speech production. Third, the three kinds of hyper-articulation

    interacted with each other; the communicatively driven hyper-articulation was prosodically modu-

    lated, such that in a clear speech mode not every segment was hyper-articulated to the same degree,

    but prosodically important landmarks (e.g., in IP-initial and/or focused conditions) were weighted

    more. Finally, Korean, a language without lexical stress and pitch accent, showed different hyper-

    articulation patterns compared to other, Indo-European languages such as English—i.e., it showed more

    robust domain-initial strengthening effects (extended beyond the first initial segment), focus effects

    (extended to V1 and V2 of the entire bisyllabic test word) and no use of global F0 features in clear

    speech. Overall, the present study suggests that the communicatively driven and the prosodically

    driven hyper-articulations are intricately intertwined in ways that reflect not only interactions of

    principles of gestural economy and contrast enhancement, but also language-specific prosodic systems,

    which further modulate how the three kinds of hyper-articulations are phonetically expressed.

    & 2011 Elsevier Ltd. All rights reserved.

    1. Introduction

    Speech production is by nature variable, so that no singleutterance can be repeated with exactly the same physical properties.Some aspects of speech variability may be inevitably physiological orbiomechanical, as no two speakers can have exactly the same vocaltract configurations nor can even the same speaker assume the exactsame articulatory posture twice. However, speech variability may alsoarise in linguistically or communicatively relevant ways. One sourceof linguistically relevant speech variation can be found with commu-nicative conditions, as described by the H&H theory (Lindblom, 1990).For example, when speakers face communicatively adverse situations(e.g., speaking in noise, or talking to hearing-impaired or non-nativelisteners), they are likely to make efforts to produce the utterancemore clearly (or hyper-articulate it) to enhance its intelligibility. On

    ll rights reserved.

    : +82 2 2220 0741.

    c.kr (S. Kim).

    the other hand, when the communicative situation is optimal (e.g.,speaking in a quiet environment or talking casually to a friend aboutmutually understood topics), speakers are likely to employ a low-costform of motor behavior to produce the utterance (or hypo-articulateit). As a result of such communicatively driven adjustment of speechproduction, an utterance may be produced with a wide range ofacoustic-phonetic variation along the hypo- to hyper-articulatedspeech continuum (see Smiljanić & Bradlow, 2009 for a comprehen-sive review).

    Another source of linguistically relevant speech variation isprosodic structure. Spoken utterances are prosodically modified,depending on the prosodic structure with which the utterance isproduced (see Cho, in press; Shattuck-Hufnagel & Turk, 1996 forreviews). The prosodic structure of an utterance is known to beexpressed not only by suprasegmental features such as F0, duration,and amplitude, but also by segmental or articulatory features. Forexample, speech production can vary with degree of prominence,such that segments are articulated strongly in lexically stressedsyllables, and the lexically stressed segments are articulated evenmore strongly when they receive higher-order prominence such as

    www.elsevier.com/locate/phoneticsdx.doi.org/10.1016/j.wocn.2011.02.005mailto:[email protected]:[email protected]:[email protected]/10.1016/j.wocn.2011.02.005

  • T. Cho et al. / Journal of Phonetics 39 (2011) 344–361 345

    accentuation or focus (e.g., in English nuclear pitch accent) (e.g., Cho,2004, 2005; Cho & Keating, 2009; de Jong, 1995, 2004; Fowler, 1995).Prosodically driven articulatory variation is also found with domain-initial strengthening: segments that occur in the initial position oflarger prosodic units (after a larger prosodic boundary, e.g., theIntonational Phrase, IP, boundary) are produced more strongly thanthose in the initial position of smaller prosodic units (after a smallerprosodic boundary, e.g., the Word boundary) (e.g., in English, Cho,2005; Cho & Keating, 2009; Fougeron & Keating, 1997; inFrench, Fougeron, 2001; in Tamil, Byrd, Kaun, Narayanan, &Saltzman, 2000; in Dutch, Cho & McQueen, 2005; in German, Kuzla,Cho, & Ernestus, 2007; in Japanese, Onaka, 2003; Onaka, Watson,Palethorpe, & Harrington, 2003; in Korean, Cho & Keating, 2001; Jun,1995; Kim, 2003). The degree of domain-initial strengthening hasbeen thought to perform the function of marking prosodic bound-aries, which contributes to signaling prosodic structure (cf. Cho, inpress).

    The sources of speech variation discussed so far can largely bedivided into three different types, all of which are thought to giverise to some kind of hyper-articulation. When Lindblom (1990)first introduced the H&H theory, hyper-articulation was expectedto occur globally in the utterance, which may be termed commu-nicatively driven (global) hyper-articulation. The term ‘communi-catively driven’ could be used in a broader sense, given that theultimate goal of speech production, in most cases, is communica-tion. However, in the present study, we use the term narrowlydefined as driven by adverse communicative situations whichwould call upon speakers to employ a clear speech mode for theimprovement of the overall intelligibility of the utterance, as usedin Lindblom (1990). de Jong (1995) distinguished this fromprominence-induced hyper-articulation by characterizing thelatter as localized hyper-articulation as it is local to a lexicallystressed syllable with nuclear pitch accent. The boundary-induced articulatory strengthening (e.g., domain-initial strength-ening) can also be thought of as another kind of localizedhyper-articulation as its effect occurs in the vicinity of prosodicjuncture (cf. Byrd & Saltzman, 2003; Cho & Keating, 2009). Thelatter two kinds of localized hyper-articulation can therefore becalled prosodically driven (local) hyper-articulation.

    The primary purpose of the present study is to investigate howthe different sources of hyper-articulation are acoustic-phoneti-cally realized in Korean by examining the above three conditions(i.e., clear speech, prominence, and boundary conditions) simul-taneously. One of the important questions is about the extent towhich they are similar and different in their acoustic-phoneticcharacteristics. The three kinds of hyper-articulation are allcharacterized by some kind of extreme articulation, and they,whether communicatively or prosodically driven, converge onone common goal—i.e., the successful delivery of linguisticmessages to the listener as intended by the speaker, which isperhaps best described by the Jakobson, Fant, and Halle’s (1965)classic statement, ‘‘We speak to be heard in order to be under-stood.’’ One can therefore expect that there will be at least someacoustic-phonetic patterns that all three types of hyper-articula-tion have in common. In principle, however, the sources of hyper-articulation are different in terms of communicative and/orlinguistic functions—i.e., enhancing the global intelligibility ofthe utterance (in a clear speech mode), demarcating the contin-uous speech into prosodic groups (marking prosodic boundaries)or signaling information locus (marking prominence). Theassumed different functions allude to separate encoding ofdifferent types of hyper-articulation in speech planning, whichmakes it plausible that they are phonetically expressed in adistinct way—at least in some phonetic dimensions. Previousstudies have indeed provided partial support for this predictionwith English. Speakers differentiate prominence-induced from

    boundary-induced hyper-articulation, supporting the view thatprosodic strengthening serves dual functions, prominence mark-ing and boundary marking, and they are encoded separately inspeech planning (Cho & Keating, 2009; Keating & Shattuck-Hufnagel, 2002; see Xu & Wang, 2009, for a related claimregarding boundary marking in Mandarin). However, no studieshave systematically compared all three types of hyper-articula-tion in any given language, addressing the issue of multiplefunctions of hyper-articulation. In the present study, we thereforeexplore this issue with Korean by examining how the prosodicallydriven hyper-articulation is distinguished from the communica-tively driven hyper-articulation, and how prominence- versusboundary-induced hyper-articulations are further differentiatedfrom each other.

    The second goal of the present study is to explore whether andhow the different sources of hyper-articulation interact with eachother. Although the communicatively driven and prosodically drivenhyper-articulation can be independently motivated (e.g., prosodicstrengthening may occur in both the casual and the clear speechmodes), they appear not to be mutually exclusive. Smiljanić andBradlow (2008b), for example, showed that more prosodic phrasesare formed in clear speech than in casual speech in English,demonstrating that modification of prosodic structure can be com-municatively driven. As now understood, prosodically driven hyper-articulation has functions to mark the information loci or linguisti-cally important positions (cf. Keating & Shattuck-Hufnagel, 2002) andthe resulting local phonetic clarities are assumed to facilitate speechcomprehension (Cho, McQueen, & Cox, 2007; Cutler & Butterfield,1992; Gow, Melvold, & Manuel, 1996). It is then reasonable to assumethat, all else being equal, the goal of the communicatively drivenhyper-articulation is more effectively achieved if speakers heightenthe phonetic clarity of segments in prosodically important locations(i.e., accented syllables or domain-initial positions) more than thosein prosodically weak locations (cf. Smiljanić & Bradlow, 2008a). Wewill test this possibility by examining the interaction effects betweenthe three types of hyper-articulation in Korean.

    The third goal of the present study is to expand our knowledge onhyper-articulation to a language other than Indo-European languages.We have chosen Korean as a test language for a number of reasons.There have been increasingly a large number of studies reported onclear speech effects primarily with English and a few other Indo-European languages, such as Spanish and Croatian (e.g., Bradlow,2002; Bradlow & Bent, 2002; Helfer, 1998; Picheny, Durlach, & Braida,1986; Schum, 1996; Smiljanić & Bradlow, 2005, 2008a, b; Uchanski,2005). Previous studies with English have definitely improved ourunderstanding of clear speech effects under various adverse commu-nication situations with different listener populations (e.g., listenerswith hearing impairments, elderly adults, non-native listeners, andchildren with learning impairments). Our knowledge on clear speecheffects in non-European languages such as Korean, however, has beenextremely limited (cf. Smiljanić & Bradlow, 2009). Only a few studies(e.g., Kang, 2010; Kang & Guion, 2008) have examined clear speechaspects of Korean with limited scopes (e.g., focusing only on how thethree-way stop contrast in Korean is enhanced in a clear speechmode), and as a result there remains much room for furtherinvestigation. Our knowledge on prosodic strengthening in Koreanhas been quite limited as well. Although domain-initial strengtheningeffects have been investigated with Korean (Cho & Keating, 2001; Jun,1993; Kim, 2003), no studies have examined interactions betweenboundary-induced strengthening and prominence (focus)-inducedstrengthening in Korean, let alone interactions between prosodicallydriven and communicatively driven hyper-articulation.

    Expanding our knowledge of both clear speech effects andprosodic strengthening effects in Korean will therefore provide usa better and more balanced insight into the different types ofhyper-articulation in general, and it will serve as a basis for

  • T. Cho et al. / Journal of Phonetics 39 (2011) 344–361346

    understanding cross-linguistic similarities and differences ofhyper-articulation in particular, by allowing us to compareKorean data with already-existing data in English and otherlanguages. In what follows, we will discuss specific questionsthat bear on language-specificity and cross-linguistic similaritiesthat are to be addressed in the present study.

    The first specific question is concerned with the relationshipbetween the scope of domain-initial strengthening and the prosodicsystem of a given language. Quite a few studies (Cho & Keating, 2001,2009; Cho & McQueen, 2005; Fougeron & Keating, 1997; Keating,Cho, Fougeron, & Hsu, 2003) have demonstrated that boundaryeffects are mainly local to the consonant in domain-initial CV, andthe effects on the following vowel have been found to be ratherlimited (cf. Byrd, 2000; Byrd, Krivokapić, & Lee, 2006; Cho, 2006,2008; Cho & Keating, 2009; Krivokapić, 2007). For example, inEnglish, although phonological vowel features are known to beenhanced in a way to enhance phonological contrasts betweenvowels in a clear speech mode (Smiljanić & Bradlow, 2005, 2008a)as well as in accented (focused) condition (Cho, 2005; de Jong, 1995,2004), boundary-induced hyper-articulation effects have not beenclearly observed in terms of enhancement of vowel features (Cho,2005; Cho & Keating, 2009). Barnes (2002) attributed the lack ofinitial strengthening effects on the vocalic articulation in CV syllablesto the specific role of the vowel in English that is arguably reservedfor acoustic manifestation of lexical stress. By examining domain-initial strengthening patterns in four different languages, includingKorean and English, Keating et al. (2003) also suggested that the morerobust domain-initial strengthening effects found in Korean than inEnglish may be due to the fact that Korean marks prosodic structureprimarily by phrasing while both phrasing and prominence markingsare employed in English.

    It is therefore possible that, unlike English, languages withoutlexical stress and pitch accent such as Korean would show morerobust domain-initial strengthening effects, such that the effects mayspread well into the vowel in a domain-initial CV syllable as itsdomain of influence is not restricted by the lexical prominencesystem. In the present study, we will test this prediction by examin-ing whether the three peripheral vowels /i,a,u/ in Korean are hyper-articulated in domain-initial CV position, and if so, how they arerealized in connection with enhancement of the phonological contrastbetween the vowels. We will also examine the extent to whichboundary-induced hyper-articulation (domain-initial strengthening)spreads into the test word even beyond the first syllable to see howfar the effect can be extended beyond the segments immediatelyadjacent to the boundary (cf. Byrd et al., 2006; Cho, 2008; Cho &Keating, 2009; Krivokapić, 2007). Any observable boundary-inducedhyper-articulation effects will then be compared with prominence(focus)-induced and clear speech-induced hyper-articulation effectsto see whether and how the three kinds of hyper-articulation aredifferentiated in the acoustic realization of the vowels.

    Our next specific question is concerned with the relationshipbetween the vowel inventory size and the vowel space expansion inhyper-articulation environments. Smiljanić and Bradlow (2008a)explored this question comparing English and Croatian in a clearspeech mode, and showed a cross-linguistically comparable degree ofvowel space expansion, despite the fact that the two languages differdrastically in the number of contrastive vowels (14 versus 5 vowelsfor English and Croatian, respectively). Based on the results, theyconcluded that the vowel contrast enhancement can be considered asuniversally applicable in clear speech production, irrespective ofvowel inventory size. Korean also has a relatively small vowelinventory size with 7 contrastive vowels (Shin & Cha, 2003). Crucially,the three peripheral vowels /i,a,u/ are each positioned in a section ofthe vowel space with no adjacent neighboring vowels in that section(which would otherwise jeopardize their distinctiveness). Hence,there would be no compelling force to expand the vowel space in

    the clear speech mode. In order to address this issue, we will examinehow the vowel (F1–F2) space formed by the three peripheral vowelsin Korean is expanded by all three kinds of hyper-articulationconditions. We will then compare the results with English andCroatian data, and discuss implications for the principles of contrastenhancement and effort minimization (Liljencrants & Lindblom,1972; Lindblom, 1990; see also Flemming, 1995, 2001).

    Finally, the Korean data will allow us to consider a question aboutthe universality versus language-specificity of pitch range expansionassociated with hyper-articulation. Previous studies on both Englishand Croatian (Smiljanić & Bradlow, 2005) showed a global expansionof pitch range in clear speech, and in a review paper, Smiljanić andBradlow (2009) implied that pitch range expansion may be auniversally applicable feature of clear speech. It is of course a viableassumption that the perceptibility of the intonational structure maybe enhanced with more pitch excursion, which may in turn con-tribute to the intelligibility enhancement in clear speech, as foundwith other languages (e.g., English and Croatian). However, Englishand Croatian both have lexical stress generally expressed by high(rising) pitch (as well as increase in duration and amplitude,cf. Lehiste, 1970), and the pitch excursion is expected to be heigh-tened with higher F0 when it receives phrasal accent (e.g., focus)serving as the locus of prominence (i.e., the head of a prosodic phrase;cf. Beckman & Edwards, 1994). It is then plausible that the pitch rangeexpansion in clear speech may be a feature of prosodic systems withlexical stress and pitch accent, while languages without lexical stresssuch as Korean may not employ pitch excursion as much.

    In sum, the present study investigates systematic acoustic-pho-netic variation in Korean as a function of different hyper-articulationconditions, in order to understand how communicatively driven (in aclear speech mode) versus prosodically driven (domain-initial orfocused) hyper-articulation effects are similar and different, how theyinteract, and how they reflect the language-specific phonological andprosodic systems of Korean.

    2. Method

    2.1. Subjects and recording

    Eight male native speakers of Seoul Korean participated in thisexperiment. All speakers were students at Hanyang University inSeoul and were paid for the participation in the study. The partici-pants were not aware of the purpose of the present study. Theacoustic data from five speakers were collected in a sound attenuatedbooth along with the articulatory data acquisition using Electromag-netic Midsagittal Articulography (Carstens Articulograph AG 200)with an AGK C420 head-mounted microphone at a sampling rate of44 kHz. The articulatory data, which we plan to report in the future,are currently being analyzed. The remaining three speakers’ acousticdata were collected, without acquisition of EMA data, at a samplingrate of 44 kHz using a SHURE KSN44 dynamic microphone and aTascam HD-P2 digital recorder in a sound attenuated booth at theHanyang Phonetics and Psycholinguistics Lab.

    2.2. Test sentences and procedure

    The test syllables were /phi, pha, phu/. The bilabial aspirated stop/ph/ was chosen as it is known to show a robust prosodicstrengthening (Cho & Jun, 2000; Jun, 1993, 1995). The threeperipheral vowels /i,a,u/ were included to see how the acousticvowel space formed by them varies as a function of three hyper-articulation-inducing factors. They were embedded in disyllabicwords in Korean (i.e., /phatPen/ ‘Korean pancake’, /phipu/ ‘skin’,and /phutiF/ ‘pudding’). These words were then included in sen-tences with which the three critical factors were manipulated: the

  • Table 1Experimental sentences with the test syllable /pha/ in /phatPPen/ in IP-initialfocused (a), IP-initial unfocused (b), IP-medial focused (c), and IP-medial unfo-

    cused (d) conditions. The test word is underlined, and focused items are in bold.

    (a) IP-initial /pha/: Focused

    Q: hjoli-nXn entPena, [IP tPhikhin-hako sul-Xl mek-ni]?Hyori-Top. always chicken-and wine-Acc. eat-Q

    ‘‘Does Hyori always eat chicken and wine?’’

    A: ani, hjoli-nXn entPena, [IP phatPen-hako sul-Xl meke]. (Test Sentence)No, Hyori-Top. always pancake-and wine-Acc. eat

    ‘‘No, Hyori always eats PAJUN (pancake) and wine.’’

    (b) IP-initial /pha/: Unfocused

    Q: hjoli-nXn entPena, [IP phatPen-hako kholla-lXl mek-ni]?Hyori-Top. always pancake-and coke-Acc. eat-Q

    ‘‘Does Hyori always eat pajun and coke?’’

    A: ani, hjoli-nXn entPena, [IP phatPen-hako sul-Xl meke]. (Test Sentence)No, Hyori-Top. always pancake-and wine-Acc. eat

    ‘‘No, Hyori always eats pajun (pancake) and WINE.’’

    (c) IP-medial /pha/: Focused

    Q: hjoli-nXn entPena, [IP kokuma tonk*asX joli-laF sotfu-lXl mek-es*-ni]?Hyori-Top. always sweet potato pork cutlet dish-and soju-Acc. eat-Past-Q

    ‘‘Did Hyori always have the sweet potato pork cutlet with soju (liquor)?’’

    A: ani, hjoli-nXn entPena, [IP kokuma phatPen joli-laF sotfu-lXl mek-es*e]. (TestSentence)

    No, Hyori-Top. always sweet potato pancake dish-and soju-Acc. eat-Past

    ‘‘No, Hyori always had the sweet potato PAJUN (pancake) dish with soju(liquor).’’

    (d) IP-medial /pha/: Unfocused

    Q: hjoli-nXn entPena, [IP kamtPa phatPen joli-laF sotfu-lXl mek-es*-ni]?Hyori-Top. always potato pancake dish-and soju-Acc. eat-Past-Q

    ‘‘Did Hyori have the potato pajun (pancake) dish with soju (liquor)?’’

    A: ani, hjoli-nXn entPena, [IP kokuma phatPen joli-laF sotfu-lXl mek-es*e]. (TestSentence)

    No, Hyori-Top. always sweet potato pancake dish-and soju-Acc. eat-Past

    ‘‘No, Hyori always had the SWEET POTATO pajun (pancake) dish with soju

    (liquor).’’

    T. Cho et al. / Journal of Phonetics 39 (2011) 344–361 347

    communicatively driven factor was Speaking style (Clear versusCasual conditions), and the prosodically driven factors were Bound-ary (IP-initial versus IP-medial conditions) and Prominence (Focusversus Unfocused conditions). Test sentences for /pha/ are given inTable 1 (see the Appendix for the other test sentences). The variousconditions associated with these factors were obtained with thefollowing procedure.

    In order to induce variation with the factors as naturally aspossible in a limited laboratory setting, and to avoid orthographicinfluences, a mini discourse situation was created in such a way thatthe subjects were shown a contextual picture on a computer screenand were asked to answer questions according to the context given inthe pictures without written scripts. For example, the IP-initial /pha/in the focused condition (Table 1a) was created as follows. Thesubject was first given the contextual picture with a famous Koreancelebrity ‘Hyori’ eating pajun /phatPen/ (a test word, ‘Korean stylepancake’) with wine. The experimenter then asked the question,‘‘Does Hyori always eat CHICKEN and wine?’’ by intentionally using‘‘chicken’’ instead of the target word pajun in the picture. The subjectwas asked to answer the question in a complete sentence accordingto the contextual information given in the picture by correcting thestatement as in, ‘‘No, Hyori always eats PAJUN and wine,’’ whichinduced a corrective narrow focus on the target word pajun.1 Speak-ers had to go through a practice session before actual recordingbecause they often produced sentences with unintended renditions.

    1 Note that we have employed ‘lexical’ focus on the target word, in that the

    target word was lexically different from the word to be corrected with no

    phonological resemblance between them. The focus manipulated in the present

    study was therefore different from phonological or segmental focus which high-

    lights the contrast in terms of a particular phonological feature or a segment while

    the rest of the word remains constant (e.g., as employed in de Jong, 2004 and van

    Heuven, 1994).

    The practice session was especially necessary in order to reduce thetime taken for the EMA recording as speakers’ endurance is generallylimited in EMA experiments. During the practice session, we alsoasked the speaker to use same declarative markers at the end of thesentence for the sake of consistency across speakers. While thispractice session may reduce spontaneity of the speech, we were stillconfident that with this elicitation strategy, we could elicit morenatural spoken utterances than those obtained in read speech.

    With respect to the IP-initial condition, by putting the fre-quency adverb /entPena/ (‘always’) just before the target word, anIP boundary was induced between them. (Without any specificinstruction, subjects tended to put the IP boundary after theadverb.) For the unfocused condition with the same IP-initial/pha/, the information locus was made somewhere else other thanthe target word (Table 1b), so that the target word was unfocused.

    For the IP-medial condition (¼the Word boundary condition)(Table 1c and d), the same focused versus unfocused conditionswere created in a similar way, but this time, an IP-medial prosodicboundary was induced by putting the target word as the secondmember in a compound (as in /kokuma phatPPen/ ‘sweet potatopancake’). Note that the preceding word /kokuma/ (‘sweetpotato’) ends with /a/ to be matched with the final vowel in/entPena/ (‘always’) in the IP-initial condition, to control thesegmental context before the target word.

    To induce different speaking styles (Clear versus Casual), thespeakers were first asked to speak casually as if they were talkingto their close friends. After the speakers’ casual style answer ineach mini discourse situation, they were then asked to repeat theanswer more clearly as if it were directed to a non-native speakerof Korean who had just started to learn Korean. In order tofacilitate the clear speech mode, the speakers were shown apicture of a person who looked like a non-Korean and theexperimenter pretended to be the person. In every trial, theexperimenter, with disguised voice mimicking a non-nativespeaker of Korean, told the speaker that he could not understandwhat had been said, and asked the speaker to repeat the sentence.

    Finally, in the experimental sentences, the test syllables/phi, pha, phu/ were always preceded by an /a/-final word (e.g.,entPena ‘always’ in yentPena phatPeny in IP-initial conditionand kokuma ‘sweet potato’ in ykokuma phatPeny in IP-medialcondition). This allowed us to examine how the domain-finalvowel /a/ before the test syllable is acoustically realized underdifferent speech styles and prosodic conditions.

    The entire corpus was repeated five times in a pseudo-randomizedorder as follows. Speakers produced sentences with three test words(/phatPen/, /phipu/, and /phutiF/) in a block, which was repeated fivetimes with different orders between the test words within the block.The order between IP-initial and IP-medial and that between focusedand unfocused conditions were also alternated. In total, 960 sentences(2 styles�2 boundaries�2 focus conditions�3 vowels�5repetitions�8 speakers) were collected and analyzed in thepresent study.

    2.3. Measurements

    2.3.1. Global measurements for clear versus casual speech

    Speaking rate: Speaking rate was included as it has been knownto be heavily influenced by the speech style, which modifies thetemporal structure of the sentence (e.g., Bradlow, Kraus, & Hayes,2003; Picheny et al., 1986; Smiljanić & Bradlow, 2005). It wasdefined as the number of syllables per second, which wascalculated based on the entire number of syllables over the entireutterance duration with silent pauses excluded.

    Global F0 measures (F0 peak, F0 minimum, and pitch (F0) range):Another clear speech effect has been found on the modification of

  • T. Cho et al. / Journal of Phonetics 39 (2011) 344–361348

    pitch movements, primarily reflected in the extended pitch rangein utterances (e.g., Bradlow et al., 2003; Picheny et al., 1986;Smiljanić & Bradlow, 2005). To examine the global pitch mod-ification more comprehensively, we included three global F0measures: F0 peak, F0 minimum, and the pitch (F0) range overthe course of the entire utterance. F0 peak and F0 minimumvalues were taken from the vowel portions of the utterance—i.e.,F0 peak was taken from the vowel which has the highest F0 peakof all the vowels and F0 minimum was taken from the vowelwhich has the lowest F0 minimum of all the vowels, using a Praatscript. To ensure that local pitch perturbation was not included,each value obtained was cross-checked and corrected by inspect-ing the F0 contour of each utterance. Pitch (F0) range was thencalculated as a difference between F0 peak and F0 minimum ofthe utterance.

    The number of IPs (phrasing): To examine how prosodic phras-ing is modified as a function of speech style, the number of IPsformed in each utterance was counted by the two of the K-ToBItrained authors. The overall agreement rate between the twotranscribers was 92%. The remaining 8% of disagreed tokens werechecked by a third trained K-ToBI transcriber, so that the numberof IPs for these tokens was finalized when at least two out of threetranscribers agreed on it.

    2.3.2. Local measurements on post-boundary test words with

    /phi, pha, phu/

    Voice onset time (VOT): VOT for the aspirated stop /ph/ wasmeasured from the time point of the stop release burst to thevoice onset of the following vowel in domain-initial position. VOTfor stops has been known to be an important acoustic parameterthat reflects both prosodically driven (e.g., Cho & Jun, 2000; Cho &Keating, 2001; Jun, 1993, 1995) and communicatively drivenspeech variation (e.g., Smiljanić & Bradlow, 2008a).

    V1 and C2V2 durations: In order to examine how the temporalstructure of the test word is modified by the speaking style andprosodic factors, we measured the acoustic duration of the vowelof the first syllable (V1) (i.e., the interval from the onset of thevoicing of the vowel to the onset of the following consonant) andthe duration of the second syllable (C2V2) of the postboundarytest word. Here, we did not separate C2 and V2 of the secondsyllable because the test words contained different consonants.Note that VOT is often inversely correlated with the acousticduration of the following vowel as it takes over some portion ofthe supralarynageal vocalic opening articulation (cf. Cho, 1996).So where there is a robust lengthening effect on VOT, there islikely a reduced or a null lengthening effect on the followingvowel as some of the vocalic lengthening effect is saturatedwith VOT.

    F0 peaks of V1 and V2: F0 maximum values of V1 and V2 (i.e.,#C1V1C2V2) were measured. Each F0 peak value was taken fromthe acoustic vowel portion by inspecting each token visually toavoid local microscopic pitch perturbations that might occurimmediately after the onset consonant. According to the prosodicmodel of Seoul Korean (Jun, 1993, 1995, 2000), the basic intona-tional structure of Korean Accentual Phrase (AP) is THLH where ‘T’(tone) is phonologically determined—i.e., H is assigned when theonset consonant is either aspirated or tensed (see Kim & Cho,2009 for further discussion on the high frequency of the basicTHLH). As an AP is assumed to be embedded inside an IP (i.e., in astrictly layered prosodic structure), the first syllable of the testwords in IP-initial position is assigned an H due to the presence ofan initial aspirated stop while the second syllable receives an H bydefault. Therefore, our bisyllabic test word is expected to beassociated with an HH tonal pattern (with both the first and thesecond syllables associated with H) as the onset is the aspirated

    stop (/ph/). The variation of F0 peak in the first and the secondvowels was examined to see how the phonologized tonal patternsin the domain-initial test word are conditioned by speaking styleand focus. Here it should be noted that we did not examine effectsof boundary on F0 peaks of the test words because IP-initial andIP-medial tonal patterns were different in our corpus. The testword in IP-initial position started with HH while the test word inIP-medial position was the second member of the two-wordcompound, which formed a single AP. Therefore, IP-medial testwords were not produced with a comparable HH tonal pattern,which made it difficult to compare IP-initial versus IP-medial F0patterns directly.

    V1 and V2 intensity peaks: For the intensity variation of the testword, intensity peak values were taken from the acoustic inten-sity profile (dB SPL) of the first and the second vowels (V1, V2).Each intensity peak value obtained from the acoustic intensityprofile was visually inspected along with the display of thewaveform to ensure that the value was taken from the acousticvowel portion.

    Vowel formant measures (F1, F2, and the Euclidean area): Inorder to investigate how the vowel formant structure of thevowels in the test words changes according to different speakingstyles and prosodic factors, F1 and F2 of the test vowels (/i,a,u/)were taken from the steady-state region from the spectrographicdisplays. The values were taken at the hand-marked steady-statepoints by an LPC formant tracking function in Praat, along withvisual inspection of the spectrogram for each token, whichnecessitated some corrections of values obtained by the formanttracking function. Based on the obtained F1 and F2, the Euclideanarea of the triangle in the acoustic vowel space, formed with threecoordinates based on (F1, F2) of the three vowels, was made toquantify the modification of vowel space under different testconditions.

    2.3.3. Local measurements at the preboundary position

    The preboundary (domain-final position) has been known as aprosodically important location as it demarcates the end of aprosodic domain. The present study therefore included someacoustic measures taken from the preboundary (domain-final)syllable (/ma/ or /na/) to see how the prosodically importantdomain-final articulation is acoustically modified under differentspeech styles and prosodic conditions. In particular, it will beinteresting to see whether the preboundary articulation showsincreased spatial magnitude (as to be reflected in F1 and F2) inKorean, given that boundary strengthening often induces length-ening of the domain-final segments without increased spatialmagnitude in English (e.g., Beckman, Edwards, & Fletcher, 1992).

    Domain-final (preboundary) V duration: The final vowel dura-tion (which was always /a/ as in /kokuma/ and /entPena/) wasmeasured to see how domain-final vowels would be temporallymodified by the test conditions.

    Domain-final (preboundary) V intensity peak: As was the casewith V1 and V2 of the postboundary test words, this measure wastaken as the peak value of the acoustic intensity (dB SPL) duringthe final vowel /a/ to examine how it is influenced by the testconditions.

    F1 and F2 of domain-final /a/: To see how the final /a/’s formantstructure is modified by the test conditions, F1 and F2 were takenin the same way used for measuring the F1 and F2 of the vowelsin the postboundary (domain-initial) test words.

    2.4. Statistical analyses

    We conducted a series of repeated measures analyses ofvariance (RM ANOVAs) for statistical evaluation of the influenceof the communicatively driven speaking style and the prosodically

  • T. Cho et al. / Journal of Phonetics 39 (2011) 344–361 349

    driven prominence and boundary on the various acoustic mea-sures. The within-subject factors considered were Speaking style(Clear versus Casual), Boundary (IP-initial or IP-final versusIP-medial), and Prominence (Focused versus Unfocused). RMANOVAs with data pooled across eight speakers (with each speakercontributing one averaged score per condition) would returnsignificance only if most speakers contributed consistently to anyobserved variations. In reporting the results, we will focus on thegeneral patterns that are statistically consistent across speakers.When there were interaction effects between factors, we con-ducted posthoc pairwise comparisons with Bonferroni/Dunn cor-rections. Effect size was estimated by conducting Z2 analyses,which provide a measure of how much the observed variabilitycan be ascribed to a given factor, and therefore show how large theobserved effect might be (Sheskin, 2000: pp. 553–556). In allANOVAs, p-values less than 0.05 were considered significant.

    3. Results

    In this section, we will first report results with respect toglobal characteristics of clear versus casual speech over the entireutterance and then results on local effects of Speaking style (Clearvs. Casual) on the test words along with effects of Boundary (IPi orIPf vs. IPm) and Prominence (Focused vs. Unfocused).

    3.1. Global characteristics of clear versus casual speech

    One-way repeated measures ANOVAs showed that there was asignificant main effect of Speaking style on speaking rate(F[1,7]¼115.85, po0.001, Z2¼0.94), showing that utterances inclear speech were produced more slowly than those in casualspeech—i.e., speakers produced fewer syllables per second ascompared to casual speech (Fig. 1a). There was also a significantmain effect of Speaking style on prosodic phrasing (the number ofIPs, F[1,7]¼42.83, po0.001, Z2¼0.86), with more IPs in clearspeech than in casual speech (Fig. 1b). As shown in Fig. 1c–e,however, there was no main effect of Speaking style on any of thethree pitch measures (F0 peak, F0 minimum, Pitch (F0) range),showing that clear speech is not associated with either higherpitch or expanded pitch range.

    3.2. Local effects on the test words with /phi, pha, phu/

    VOT: VOT for the test consonant /ph/ showed main effects of allthree factors. VOT was longer in clear speech than in casual

    Fig. 1. Effects of Speaking style (clear versus casual) on global acoustic measures: Sperange (e) (*po0.05 and **po0.01).

    speech (F[1,7]¼16.35, p¼0.005, Z2¼0.7), longer in IP-initialposition than in IP-medial position (F[1,7]¼11.18, po0.05,Z2¼0.62), and longer when the test word was focused than whenunfocused (F[1,7]¼18.03, po0.005, Z2¼0.72), as shown inFig. 2a. A significant Speaking style�Prominence interactionwas found (F[1,7]¼17.53, po0.005, Z2¼0.48), which was in partdue to a more robust clear speech effect in the focused condition(mean diff. 12 ms, t(7)¼3.9, po0.01, Z2¼0.69) than in theunfocused condition (mean diff. 6 ms, t(7)¼3.51, po0.05,Z2¼0.64) (Fig. 2b). There was also a significant Boundary�Prominence interaction (F[1,7]¼17.53, po0.005, Z2¼0.72),which stemmed from two interrelated facts—i.e., the focus-induced VOT lengthening was reliable in IP-medial position(mean diff. 22 ms, t(7)¼5.24, p¼0.001, Z2¼0.8), but not inIP-initial position (mean diff. 6 ms, t(7)¼1.73, p40.1), and theboundary-induced VOT lengthening was reliable only in theunfocused condition (mean diff. 18 ms, t(7)¼3.97, p¼0.005,Z2¼0.69) (Fig. 2c).

    V1 duration: For the vowel (V1) duration, a significant maineffect was found only with Speaking style (F[1,7]¼17.28,po0.001, Z2¼0.87), showing longer V1 duration in clear speechthan in casual speech (Fig. 3a). Boundary and Prominence yieldedno significant main effects. The Speaking style effect, however,interacted with Boundary and Prominence (F[1,7]¼22.34,po0.005, Z2¼0.76, F[1,7]¼6.46, po0.05, Z2¼0.48, respectively).Posthoc comparisons showed that the interaction was due to amore robust clear speech effect found in prosodically strongpositions: The effect was more robust in IP-initial CV position(mean diff. 19 ms, t(7)¼7.33, po0.001, Z2¼0.89) than inIP-medial position (mean diff. 10 ms, t(7)¼4.95, po0.005,Z2¼0.78) (Fig. 3b), and in the focused condition (mean diff.16 ms, t(7)¼6.92, po0.001, Z2¼0.87) than in the unfocusedcondition (mean diff. 12 ms, t(7)¼5.86, p¼0.001, Z2¼0.83)(Fig. 3c). As shown in Fig. 3d, there was also a significantBoundary� Prominence interaction (F[1,7]¼6.21, po0.05,Z2¼0.47), which was attributable to a significant boundary effectin the focused condition (mean diff. 7 ms, t(7)¼3.37, po0.05,Z2¼0.62), but not in the unfocused condition (mean diff. 2 ms,t(7)¼�0.67, p40.1, Z2¼0.06).

    The second syllable (C2V2) duration: All three factors showedsignificant main effects on the second syllable (C2V2) duration(Speaking style, F[1,7]¼30.25, p¼0.001, Z2¼0.81; Boundary,F[1,7]¼70.03, po0.001, Z2¼0.91; Prominence, F[1,7]¼41.49,po0.001, Z2¼0.86), with a longer duration in clear (versuscasual), focused (versus unfocused), and IP-initial (versusIP-medial) conditions (Fig. 4a). Again, Speaking style interacted

    aking rate (a), Prosodic phrasing (b), F0 peak (c), F0 minimum (d), and Pitch (F0)

  • Fig. 2. Main effects of Speaking style, Boundary and Prominence on VOT of /ph/ (a), Speaking style�Prominence interaction (b) and Boundary�Prominence interaction (c)(*po0.05 and **po0.01).

    Fig. 3. Main effects of Speaking style, Boundary and Prominence on V1 duration (a), Speaking style�Boundary interaction (b), Speaking style�Prominence interaction (c),and Boundary�Prominence interaction (d) (*po0.05 and **po0.01).

    T. Cho et al. / Journal of Phonetics 39 (2011) 344–361350

  • Fig. 4. Main effects of Speaking style, Boundary and Prominence on the second syllable (C2V2) duration (a), Speaking style�Boundary interaction (b) and Speakingstyle�Prominence interaction (c) (*po0.05 and **po0.01).

    Fig. 5. Main effects of Speaking style and Prominence on V1 F0 peaks (a) and V2 F0 peaks (b). (*po0.05 and **po0.01.).

    T. Cho et al. / Journal of Phonetics 39 (2011) 344–361 351

    with Boundary (F[1,7]¼12.97, po0.01, Z2¼0.65) and Prominence(F[1,7]¼6.61, po0.05, Z2¼0.49). As was the case with V1 dura-tion, the interaction was mainly due to a more robust clear speecheffect in prosodically strong positions: a more robust clear speecheffect was found when the test word was in IP-initial position(mean diff. 50 ms, t(7)¼5.79, p¼0.001, Z2¼0.83) than in IP-medial position (mean diff. 32 ms, t(7)¼4.64, po0.005,Z2¼0.76), and in the focused condition (mean diff. 46 ms,t(7)¼7.01, po0.001, Z2¼0.88) than in the unfocused condition(mean diff. 36 ms, t(7)¼4.07, p¼0.005, Z2¼0.7) (Fig. 4b and c).

    V1 and V2 F0 peaks: Both V1 and V2 F0 peaks showed no maineffect of Speaking style (F[1,7]o1 for both V1 and V2), but a

    significant main effect of Prominence (V1: F[1,7]¼119.36, po0.001,Z2¼0.95; V2: F[1,7]¼174.88, po0.001, Z2¼0.96)—i.e., it was higherwhen the test word was in the focused condition than in theunfocused condition (Fig. 5a and b). Interestingly, however, therewas no Speaking style effect, nor was there any interaction betweenfactors, showing no pitch modification due to Speaking style in thetest word.

    V1 and V2 intensity peaks: For V1 intensity peak, only theProminence factor showed a significant main effect (F[1,7]¼39.86,po0.001, Z2¼0.85): V1 intensity peak was greater in the focusedcondition than in the unfocused condition (Fig. 6a). However, therewas a significant Speaking style�Boundary interaction (F[1,7]¼9.44,

  • Fig. 6. Main effects of Speaking style, Boundary and Prominence on V1 Intensity peak (a) and V2 Intensity peak (b) (*po0.05 and **po0.01).

    Fig. 7. /i/-/a/-/u/ Euclidean area in Clear versus Casual speech (Speaking style) (a), IP-initial versus IP-medial position (Boundary) (b), and Focused versus Unfocusedposition (Prominence) (c).

    T. Cho et al. / Journal of Phonetics 39 (2011) 344–361352

    po0.05, Z2¼0.57), which was again due to the fact that the clearspeech effect did arise but only in a prosodically strong, IP-initialposition (mean diff. 1 dB, t(7)¼2.32, p¼0.05, Z2¼0.44), not inIP-medial position (mean diff. 0.1 dB, t(7)¼�0.17, p40.1,Z2¼0.01) (the figure is not given). There was also a significantBoundary�Prominence interaction (F[1,7]¼8.53, po0.05,Z2¼0.55), showing that a boundary effect existed only when V1was focused—i.e., the greater V1 intensity for IP-initial test wordsthan IP-medial ones in the focused condition (mean diff. 1.36 dB,t(7)¼3.37, po0.05, Z2¼0.62).

    As for V2 intensity peak, all three factors showed a significantmain effect (Speaking style, F[1,7]¼6.83, po0.05, Z2¼0.49;Boundary, F[1,7]¼19.22, po0.005, Z2¼0.73; Prominence,F[1,7]¼35.79, p¼0.001, Z2¼0.84). V2 intensity peak was higherin clear speech than in casual speech, higher when the test wordwas in IP-initial position than in IP-medial position, and higher in

    the focused condition than in the unfocused condition (Fig. 6b).No interactions between factors were found.

    F1 and F2 of V1: /a/ in /pha/: All three factors showed significantmain effects on F1 of /a/ (Speaking style, F[1,7]¼10.95, po0.05,Z2¼0.61; Boundary, F[1,7]¼33.47, p¼0.001, Z2¼0.83; Prominence,F[1,7]¼43.31, po0.001, Z2¼0.86). F1 was higher (thus positioning /a/ lower in the acoustic vowel space) in clear than in casual speech;higher in IP-initial CV position than IP-medial CV position, and higherin the focused condition than in the unfocused condition. There was aSpeaking style�Boundary interaction (F[1,7]¼5.37, p¼0.05,Z2¼0.43), which was again in part due to an asymmetrical clearspeech effect between IP-initial and IP-medial position: a significantlyhigher F1 (positioning /a/ lower in the vowel space) in clear speechwas found in IP-initial CV position (mean diff. 73 Hz, t(7)¼4.57,po0.005, Z2¼0.75), but no clear speech effect was observed inIP-medial position (mean diff. 23 Hz, t(7)¼0.8, p40.1, Z2¼0.08).

  • T. Cho et al. / Journal of Phonetics 39 (2011) 344–361 353

    F2 of /a/ showed a significant main effect only with Promi-nence (F[1,7]¼9.68, po0.05, Z2¼0.58)—i.e., F2 was higher (posi-tioning /a/ more advanced in the acoustic vowel space) in thefocused condition than in the unfocused condition. There was asignificant Boundary�Prominence interaction (F[1,7]¼6.18,po0.05, Z2¼0.47) due to the fact that the focus effect wassignificant only when the vowel was positioned IP-medially(mean diff. 83 Hz, t(7)¼3.86, po0.01, Z2¼0.68).

    F1 and F2 of V1: /i/ in /phi/: No factor showed a significant effect onF1 of /i/, but there was a significant interaction between Boundaryand Prominence (F[1,7]¼5.71, po0.05, Z2¼0.45), which was due toa significant boundary effect only in the focused condition: showingboundary-induced lower F1 (positioning /i/ higher in the vowelspace) only in the focused condition (focused: mean diff. 25 Hz,t(7)¼�2.43, po0.05, Z2¼0.46; unfocused: mean diff. 4 Hz,t(7)¼0.53, p¼0.61, Z2¼0.04). Unlike F1, however, F2 showed sig-nificant main effects of all three factors (Speaking style, F[1,7]¼32.94,p¼0.001, Z2¼0.83; Boundary, F[1,7]¼17.51, po0.005, Z2¼0.71;Prominence, F[1,7]¼11.59, po0.05, Z2¼0.62), showing that F2 washigher (positioning /i/ more advanced in the acoustic vowel space) inclear than in casual speech, in IP-initial than in IP-medial position,and in the focused condition than in the unfocused condition. Nobetween-factor interactions were found.

    F1 and F2 of V1: /u/ in /phu/: As was the case with /i/, no factorshowed a significant main effect on F1 of /u/. There was nobetween-factor interaction, either. On the other hand, F2 of /u/showed significant main effects of Speaking style (F[1,7]¼5.8,p¼0.05, Z2¼0.49) and Prominence (F[1,7]¼14.56, po0.01,Z2¼0.71) with no Boundary effect (F[1,7]o1): F2 was lower(showing a more retracted vowel quality in the vowel space) inclear than casual speech, and it was also lower in the focusedcondition than in the unfocused condition. No between-factorinteractions were found with F2.

    Fig. 8. Main effects of Speaking style and Boundary on the final V (/a/) du

    /i/-/a/-/u/ Euclidean Area: All three factors showed significant maineffects on /i-a-u/ Euclidean area (Speaking style, F[1,7]¼29.47,po0.005, Z2¼0.83; Boundary, F[1,7]¼26.73, po0.005, Z2¼0.82;Prominence, F[1,7]¼34.71, p¼0.001, Z2¼0.85). As shown in Fig. 7,the Euclidean area was larger in clear than in casual speech (Fig. 7a),larger in IP-initial than in IP-medial position (Fig. 7b), and larger inthe focused condition than in the unfocused condition (Fig. 7c). (Notealso that the main effects on F1 and F2 for all three test vowels can beinferred from Fig. 7.) There was a significant interaction effect ofSpeaking style and Boundary (F[1,7]¼14.16, po0.01, Z2¼0.7), whichwas in part due to a more robust clear speech effect in IP-initial(mean diff. 52655 Hz2, t(7)¼5.51, p¼0.001, Z2¼0.81) than inIP-medial position (mean diff. 21813 Hz2, t(7)¼3.31, po0.05,Z2¼0.61) (the figure is not given).

    3.3. Local effects on preboundary /a/

    Final vowel (/a/) duration in preboundary position: Results ofrepeated measures ANOVAs showed significant main effects ofSpeaking style and Boundary on the final vowel duration(F[1,7]¼19.56, po0.01, Z2¼0.69; F[1,7]¼114.43, po0.001,Z2¼0.94, respectively), showing that the final vowel was longerin clear speech than in casual speech and it was longer in IP-finalthan in IP-medial position (Fig. 8a). There was, however, asignificant Speaking style x Boundary interaction (F[1,7]¼9.75,po0.05, Z2¼0.58), due to the fact that the clear speech effect size(i.e., mean difference) was larger in IP-final position than inIP-medial position (IP-finally: mean diff. 137 ms, t(7)¼3.57,po0.01, Z2¼0.65; IP-medially: mean diff. 24 ms, t(7)¼6.9,po0.001, Z2¼0.87) (Fig. 9).

    Final vowel (/a/) intensity: The intensity peak of the final vowel(/a/) showed a significant Boundary effect (F[1,7]¼23, po0.005,

    ration (a), Intensity (b), F1 (c), and F2 (d) (*po0.05 and **po0.01).

  • Fig. 9. Speaking style�Boundary interaction on Final V (/a/) duration (*po0.05and **po0.01).

    T. Cho et al. / Journal of Phonetics 39 (2011) 344–361354

    Z2¼0.94)—i.e., it was smaller in IP-final position than in IP-medialposition (Fig. 8b). There was no effect of Speaking style, nor wasthere a significant Speaking style�Boundary interaction.

    F1 and F2 of final vowel (/a/): There were significant maineffects of Speaking style and Boundary on F1, which was higher(positioning the final /a/ lower in the vowel space) in clear speechthan in casual speech (670 Hz vs. 641 Hz; F[1,7]¼14.67, po0.01,Z2¼0.68) and higher in IP-final than IP-medial position (671 Hzvs. 641 Hz; F[1,7]¼15.03, po0.01, Z2¼0.68) (Fig. 8c). F2 alsoshowed significant effects of Speaking style (1336 Hz vs. 1290 Hz;F[1,7]¼10.32, po0.05, Z2¼0.6) and Boundary (1383 Hz vs.1242 Hz; F[1,7]¼69.9, po0.001, Z2¼0.91). F2 was significantlyhigher (positioning /a/ more advanced in the acoustic vowelspace) in clear than in casual speech, and it was higher in IP-finalthan in IP-medial position (Fig. 8d). No interactions betweenfactors were found.

    4. Summary and discussion

    In this section, we will first summarize the main findingsabout communicatively driven hyper-articulation (clear speecheffects) versus prosodically driven hyper-articulation (boundaryand prominence effects) in Korean. We will then discuss theirimplications in terms of specific questions that were raised at theoutset of the paper.

    4.1. Communicatively driven hyper-articulation: global and local

    clear speech effects

    4.1.1. Global clear speech effects

    The results showed global clear speech effects (i.e., effects overthe course of the entire utterance) in Korean most clearly intemporal dimension and prosodic phrasing. The clear speech modeinduced a decrease in overall speaking rate (i.e., fewer syllablesper second) and an increase in the number of prosodic phrases (i.e.,more IPs per sentence). This builds on and extends previous findingsof clear speech effects in Indo-European languages such as English,Spanish, and Croatian (cf. Bradlow, 2002; Picheny et al., 1986;Smiljanić & Bradlow, 2005, 2008a, 2008b; Uchanski, 2005) to anew language, Korean, which is typologically and prosodicallydifferent. The results thus confirmed the cross-linguistic tendencythat the communicatively driven global hyper-articulation gives riseto modification of the temporal and prosodic structure of theutterance to accommodate listeners in difficult communicativesituations (in this case those with limited experience in the testlanguage). The global slowing down of the utterance would allowmore time for the listeners to process the speech signal, and placing

    a greater number of major prosodic boundaries is likely to enhancelexical segmentation (e.g., Christophe, Peperkamp, Pallier, Block, &Mehler, 2004; Kim & Cho, 2009), which, taken together, function toenhance the overall intelligibility of the utterance. The presentstudy, however, showed an interesting pattern which runs counterto previously observed clear speech effects in English and Croatian(e.g., Smiljanić & Bradlow, 2005, 2008a)—i.e., the clear speech modein Korean had no effect on the pitch range over the course of thesentence (see Section 4.5.3 for further discussion on this point).

    4.1.2. Local clear speech effects

    Along with the global clear speech effects, we have also examinedthe extent to which syllables locally positioned in the vicinity ofprosodic juncture are acoustically modified as a function of Speakingstyle. Results showed that clear speech triggered local temporalexpansion, as reflected in lengthened preboundary vowel /a/ andpostboundary syllables (both the first and the second syllables) aswell as lengthened VOT of the aspirated stop /ph/. In addition, wehave also found evidence for clear speech-induced spatialexpansion—i.e., clear speech was associated with more extreme F1and/or F2 values of each of the three peripheral test vowels /i,a,u/along with expansion of the F1-F2 vowel space formed by the vowels.Notably, /a/ was produced with higher F1, suggesting a more loweredvowel quality (interpretable as enhancement of [+low] for /a/); /i/was produced with higher F2, showing a more advanced vowelquality (interpretable as enhancement of [�back] for /i/); and /u/showed lowered F2, showing a retracted vowel quality (interpretableas enhancement of [+back] for /u/). These individual clear speechpatterns converged on the expansion of the /i-a-u/ Euclidean (trian-gular) area, showing the expansion of the acoustic vowel space in aclear speech mode in line with Smiljanić and Bradlow (2005). Finally,as was the case with the global clear speech effects, there was nonotable clear speech effect on pitch in the local test syllables (i.e., thedomain-final and domain-initial syllables).

    4.2. Prosodically driven hyper-articulation: effects of boundary and

    prominence (focus)

    4.2.1. Boundary effects

    The results showed substantial final lengthening for /a/ inpreboundary position—i.e., the final /a/ (in /na/ or /ma/) waslonger in IP-final position than in IP-medial position, which is inline with the general cross-linguistic phrase-final lengtheningpattern (e.g., Cho & Keating, 2001; Edwards, Beckman, &Fletcher, 1991; Klatt, 1975; Wightman, Shattuck-Hufnagel,Ostendorf, & Price, 1992). In addition to the final lengtheningeffect, we have also found lowering of domain-final /a/ (withhigher F1) in the acoustic vowel space. These results suggest thatdomain-final articulation undergoes some sort of hyper-articula-tion in both spatial and temporal dimensions (see Section 4.5.2for related discussion).

    The present study also found robust domain-initial strengtheningeffects in both temporal and spatial dimensions. The test consonant/ph/ was produced with longer VOT IP-initially than IP-medially,which can be interpreted as having resulted from strengthening ofglottal abduction articulation (cf. Cho & Keating, 2001; Pierrehumbert& Talkin, 1992). Regarding the vowels after the consonant, we havefound mixed domain-initial strengthening effects. On the one hand,we have not observed boundary-induced lengthening of the vowelafter the initial consonant, which is consistent with Cho and Keating(2001), who tentatively concluded that the domain-initial strength-ening was limited to the initial consonant in Korean as with otherlanguages such as English and French (Fougeron, 2001; Fougeron &Keating, 1997). On the other hand, we have found some evidence thatdomain-initial strengthening is indeed extended beyond the initial

  • T. Cho et al. / Journal of Phonetics 39 (2011) 344–361 355

    consonant in Korean. The test vowels /i/ and /a/ were produced withspatial expansion when they were produced in IP-initial CV syllablesas reflected in their F1 and F2 values. Notably, F2 of /i/ was higher inIP-initial CV syllables than IP-medial CV syllables, positioning /i/ moreadvanced, and F1 of /a/ was higher, positioning /a/ lower in the F1-F2vowel space. The boundary-induced spatial expansion in initial CVsyllables was further evident with the expansion of the vowel spaceas reflected in the enlarged Euclidean area (see Section 4.5.1 forfurther discussion on this point).

    In addition to initial strengthening effects on the vowel of thefirst syllable, the results also showed that the domain-initialeffect can spread even into the second syllable of the test words.The V2 intensity peak and the second syllable duration increasedwhen the test word was positioned IP-initially than IP-medially.

    Before we move on, it is worth considering another interestingfinding regarding domain-initial effects—i.e., there was no sig-nificant main effect of Boundary on the initial vowel’s (V1)duration even though the duration of the second syllable wassignificantly longer in IP-initial position. At first glance, thislengthening effect on the second syllable may look inconsistentwith the lack of V1 lengthening. It looks as if the rightwardtemporal influence bypassed the first vowel and affected thesecond syllable. Furthermore, Cho and Keating (2001) suggestedthat domain-initial strengthening is closely related with length-ening, such that in prosodically weak position, reduced time mayinduce articulatory target undershoot, but in prosodically strongposition with enough time allowed, the target would be fullyreached (cf. Lindblom, 1963). One might therefore expect that thevowel space expansion that was found with V1 would go hand inhand with temporal expansion of the vowel, which has not beenobserved in the present study.

    These seeming paradoxes, however, can be understood when weconsider the sum of VOT and the duration of the following vowel,which is significantly longer IP-initially than IP-medially. VOT for theaspirated stop was on average by far longer than the following vowelduration (76.4 ms vs. 47.9 ms). Given this durational asymmetrybetween VOT of the aspirated stop and the following vowel, Cho(1996) characterized VOT as the ‘voiceless’ vowel as it takes over asubstantial part of the interval of the supralaryngeal articulation dueto a long voicing lag. This can be further understood from thekinematic point of view. It is well known that in CV articulation thevocalic gesture starts well before the consonant release (cf. Browman& Goldstein, 1990, 1992), and the interval of VOT indeed overlapsfully with the vocalic movement duration of the gesture for the voweltarget. It is therefore reasonable to assume that much of the acoustictemporal effect on the first syllable is saturated with VOT, and henceno temporal expansion is observed in the vowel. Nonetheless, thearticulatory movement duration of the domain-initial vocalic gesturehas been found to be lengthened in Korean (Cho, Yoon, & Kim, inpreparation), which implies that the acoustic vowel expansion mayindeed be closely correlated with the kinematic duration of thevocalic gesture, which is masked acoustically by VOT.

    4.2.2. Prominence (focus) effects

    As with initial strengthening effects, the VOT of /ph/ and thesecond syllable of the postboundary test words were lengthenedin the focused condition, which suggests that the duration of theentire word is influenced by prominence. Similar to the findingswith boundary effects, the V1 of the test words did not showprominence-induced temporal expansion. (Again the lack of thefocus effect on V1 duration could be accounted for by the inverserelationship between VOT and the following vowel duration, asdiscussed in the preceding section.) As was with initial strength-ening, spatial expansion was found in the focused condition: F1and F2 of /a/ were higher (indicating a more lowered and

    advanced vowel quality in the acoustic vowel space); F2 of /i/was higher (indicating a more advanced vowel quality); F2 of /u/was lower (indicating a more retracted vowel quality) in thefocused condition than in the unfocused condition. The /i-a-u/Euclidean area was significantly greater in the focused conditionthan in the unfocused condition as well, showing prominence-induced vowel space expansion. There was also an increase in F0peaks for V1 and V2 in the focused condition. Finally, unlike theclear speech and boundary strengthening effects, the focusedcondition was associated with an increase in both V1 and V2intensity peaks.

    4.3. Interaction effects

    4.3.1. Interactions between communicatively driven and

    prosodically driven factors

    One of the questions that the present study has endeavored toanswer was whether communicatively driven hyper-articulation(induced by clear speech) interacts with, or is conditioned by,prosodically driven hyper-articulation associated with boundaryand prominence, and if so, how. The results indeed showed thatthe former interacts significantly with the latter in severalacoustic measures. First, there were significant interaction effectsbetween Speaking style and Boundary on temporal measuressuch as V1 duration and duration of the second syllable, andnon-temporal measures such as V1 intensity and the overall/i-a-u/ Euclidean area. These interactions stemmed from a con-sistent pattern that the communicatively driven hyper-articula-tion was more robust when segments occurred in IP-initial CVsyllables than IP-medial CV syllables, except for V1 intensitywhich showed a significant clear speech effect only when thevowel was in IP-initial CV syllables. A similar Speaking sty-le�Boundary interaction was found with the duration of thepreboundary (domain-final) /a/, again showing a more robustclear speech effect IP-finally than IP-medially. Second, Speakingstyle also interacted with Prominence on VOT, V1 duration,duration of the second syllable, and V1 intensity, which alsoshowed a more robust clear speech effect in the focused conditionthan in the unfocused condition, except for V1 intensity whichshowed a significant clear speech effect only when the test wordwas in the focused condition. These interaction effects suggestthat the clear speech effects are conditioned by prosodic factors ina way that segments positioned in prosodically strong locationsare weighted more (see Section 4.4 for further discussion on thispoint).

    4.3.2. Interactions between the two prosodic factors, boundary and

    Prominence

    The results also showed some interaction effects between twoprosodic factors, Boundary and Prominence, with three acousticmeasures: V1 duration, V1 intensity, and VOT. Among thesemeasures, V1 duration and V1 intensity showed no main effectof Boundary, but significant boundary effects were found onlywhen the test word was in the focused condition. The interactioneffect on VOT, however, revealed the opposite pattern. Theboundary-induced VOT difference (longer IP-initially thanIP-medially) turned out to be significant, this time in the proso-dically ‘weak’ condition, that is, only when the test word was‘unfocused.’ The lack of boundary effect on VOT in the focusedcondition can be interpreted as coming from a ceiling effectassociated with the focused condition. In fact, among the threefactors (Speaking style, Boundary, and Prominence), the Promi-nence factor induced the largest VOT difference (86 ms in thefocused condition versus 69 ms in the unfocused condition, with17 ms mean diff.). It is then plausible that there is no room left in

  • T. Cho et al. / Journal of Phonetics 39 (2011) 344–361356

    the focused condition for additional lengthening of VOT comingfrom boundary.2 This ceiling effect account is not new, but isconsistent with Cho (2006) who reported that />/ in domain-initial CV in English showed spatial expansion only when unac-cented while no such effect was found in accented condition,which was interpreted as coming from a ceiling effect.3

    F1 and F2 measures for the three test vowels also showedsome Boundary by Prominence interactions, but the patternswere inconsistent. For /i/, only F1 showed a significant Boundaryby Prominence interaction due to the fact that boundary-inducedlower F1 (positioning /i/ higher in the vowel space) was signifi-cant only in the prosodically strong, focused, condition, thusbeing consistent with the general Boundary by Prominenceinteraction pattern found with V1 duration and V1 intensity. For/a/, however, F2 showed a significant Boundary by Prominenceinteraction, but this time it was due to a significant focus effectonly in the prosodically weak, IP-medial position. Other thanthese two interaction effects, no other formant measures showedcomparable interaction effects.

    Considering all these Boundary�Prominence interactioneffects, although it is hard to make an across-the-board general-ization about the direction of the interactions, the results dosupport the view that the two prosodically driven factors are notentirely independent, but that they do influence each other, aswas observed with English (cf. de Jong, 2004; Cho & Keating,2009). In particular, as far as the two general acoustic measures,V1 duration and V1 intensity, and one /i/-specific F1 measure areconcerned, it appears that boundary effects on these measures arenot fully realized when the test words occur in the prosodicallyweak, unfocused, condition, but they are reinforced in theprosodically strong, focused, condition.

    4.4. Similarities and differences between the communicatively

    driven and the prosodically driven hyper-articulation

    Another important question that the present study aimed toanswer was whether and how the speakers would differentiatebetween the communicatively driven versus the prosodicallydriven hyper-articulation, and between the boundary-inducedversus the prominence-induced hyper-articulation.

    Let us first consider the similarities. Clear speech effects werefound to be similar to prosodic strengthening effects (due toboundary and prominence) in various aspects in both temporaland spatial dimensions. First, all three factors gave rise totemporal expansion in strong environments (i.e., in clear speech,focused and IP-initial conditions), specifically with longer VOTand longer duration of the second syllable of the test words.Second, all three factors showed some degree of increase inloudness of the test words, especially on V2 of the test wordswhose intensity peak was augmented in clear speech, focused anddomain-initial (IPi) conditions. Third, all three factors inducedexpansion of the acoustic vowel space, as reflected in some F1 and

    2 Silva (2006) reported that mean VOT values of aspirated stops produced by

    young Koreans (born after 1980) were centered around 70 ms when stops were

    produced in a frame sentence. Given that our speakers were the same generation

    as those in Silva (2006), it is reasonable to assume independently that the mean

    VOT of 86 ms observed in the focused condition in the present study is already

    exceptionally long, leaving no much room for further expansion.3 Another similar ceiling effect could be found in Smiljanić and Bradlow

    (2008a). They found that although the durational contrast between the tense and

    the lax vowels in English was larger in clear speech in domain-final position

    compared to domain-medial position, the durational difference of the vowel as a

    function of the following consonant’s voicing was not larger in clear speech. They

    suggested that the lack of clear speech effects in the latter case was due to a limit

    to the amount of combined lengthening effects of domain-final position and clear

    speech.

    F2 values of the three test vowels which converged on theenlarged /i-a-u/ Euclidean area.

    Although the three kinds of hyper-articulation with differentsources have common acoustic properties, we have also seen thatthey differ in some other acoustic aspects. Most notably, thecommunicatively driven hyper-articulation was distinct fromprosodically driven hyper-articulation in that while the F0 peaksfor the first and the second vowels of the test word (C1V1C2V2)were higher in IP-initial (than IP-medial) and in focused (thanunfocused) conditions, no increase in F0 peak and pitch range wasfound in clear speech condition. Furthermore, only the commu-nicatively driven (clear speech-induced) hyper-articulationinduced an increase in V1 duration across the board withoutfurther interactions with other factors. Finally, although all threekinds of hyper-articulation were associated with the vowel spaceexpansion, the sources of the vowel expansion were different tosome degree. As for /a/, all three factors induced higher F1 of /a/(indicating a more lowered vowel quality, and therefore thegreater mouth opening), but F2 of /a/ was influenced only bythe Prominence factor, being higher in the focused condition thanin the unfocused condition. As for /i/, the Boundary� Prominenceinteraction showed that F1 of /i/ was lower (therefore raised inthe vowel space) in IP-initial CV condition than IP-medial CVcondition, but again only when it occurred in the focusedcondition. As for /u/, F2 differentiated the boundary effect fromthe clear speech and the prominence effects, in that /u/ wasassociated with lower F2 (indicating a more retracted vowelquality) in clear speech and focused conditions, but not inIP-initial CV condition.

    Regarding boundary-induced versus prominence-inducedhyper-articulation patterns, there are a few acoustic parametersthat still make them distinct. One clear difference was in thepresence or absence of the main effect on V1 intensity—i.e., onlythe focused condition gave rise to an increase in V1 intensity.Another dimension in which they differed was F2 for /u/ and /a/,which was influenced by Prominence but not by Boundary: /u/was associated with lower F2 (thus more retracted in the vowelspace) in the focused condition, and /a/ was produced with higherF2 (thus more advanced in the acoustic vowel space) in thefocused condition.4

    The similarities and differences between three kinds of hyper-articulation and their interactions have implications for how thehypo- and hyper-articulation continuum is employed by thespeaker in order to achieve communicatively driven and proso-dically driven articulatory goals.

    First, the similarities suggest that although the sources ofhyper-articulation are in principle different, the three kinds ofhyper-articulation can be characterized by some common hyper-articulation patterns, converging on one common goal—i.e.,heightening of phonetic clarity, whether communicatively drivenin the sense of the H&H theory or prosodically driven in the senseof prosodic strengthening, in order to get the intended linguisticmessage across to the listener successfully. The phonetic clarity

    4 In English, it has been suggested that the low back vowel />/, which may bespecified with [+low, +back], is produced with lower F2 in prosodically strong

    environments, suggesting that the feature [+back] is enhanced (Cho, 2005). Unlike

    English />/, however, Korean /a/ is rather central (somewhat fronted) (Yang, 1996).We did not, therefore, have a priori prediction on how the place feature of /a/ in

    the backness dimension would be enhanced. Given that the F2 value in the

    focused condition was 1484 Hz, positing /a/ more fronted as opposed to 1430 Hz

    in the unfocused condition, and given that focus can be used as a diagnostic for

    what phonetic content is used to mark contrastiveness of the phoneme in a given

    language (e.g., de Jong & Zawaydeh, 2002; de Jong, 2004), the pattern suggests that

    the backness is not an important feature for /a/ in Korean. The more fronted

    quality of /a/ under focus may then be interpreted as reinforcing its centrality

    when focused.

  • 5 de Jong and his colleagues (de Jong, 1991, 1995, 2004; de Jong & Zawaydeh,

    2002; Silbert & de Jong, 2008) discuss how the communicative effectiveness can

    be achieved by hyper-articulation. For example, based on the finding that the

    durational difference of the vowel before a voiced versus a voiceless consonant is

    enhanced in English under stress, but not in Arabic, they proposed that the

    durational difference as a function of voicing of the following consonant is

    ‘‘linguistically specified’’ in English but not in Arabic, and that the specified

    phonetic contents of phonological contrast are most likely to be subject to hyper-

    articulation (de Jong & Zawaydeh, 2002; de Jong 2004). That is, what phonetic

    content is enhanced under hyper-articulation conditions is modulated by the

    linguistic roles of the phonetic content in a given language.

    T. Cho et al. / Journal of Phonetics 39 (2011) 344–361 357

    obtained in a clear speech mode can be taken to enhance theintelligibility of the utterance for the successful delivery of theglobal linguistic message; the phonetic clarity of segments atedges of prosodic domains can be understood as marking proso-dic boundaries for the successful delivery of prosodic phrasinginformation; and the phonetic clarity of focused words can beconsidered to highlight the semantic message embedded in theinformational structure. After all, all three types of hyper-articu-lation can be taken together to eventually facilitate speechcomprehension by virtue of similar patterns of articulatorystrengthening. In this sense, the same hypo- to hyper-articulationcontinuum, which was originally thought to be effectively usedfor communicative purposes in the H&H theory (Lindblom, 1990),is also employed for prosodically driven ‘localized’ hyper-articu-lation (see relevant discussion in the Introduction section).

    Second, despite the remarkable similarities arising with com-municatively driven versus prosodically driven hyper-articula-tion, differences between them suggest that the different sourcesof hyper-articulation are indeed differentially manifested inphonetic output, supporting the view that they are phoneticallyencoded separately in speech planning (cf. Cho, in press; Cho &Keating, 2009; Keating, 2006; Keating & Shattuck-Hufnagel,2002). However, the between-factor interactions imply that theseparate encoding of different sources of hyper-articulation is notentirely independent, but that they are closely intertwined. Mostnotably, we have seen evidence that the communicatively drivenhyper-articulation is prosodically modulated: When speakers arein need of enhancing their intelligibility in adverse speakingcondition, for example, they do not simply hyper-articulate everysegment in the utterance to an equal degree, but rather theyappear to make more efforts to heighten the phonetic clarity ofsegments that are positioned in prosodically importantlandmarks—i.e., in domain-initial position or in focused words.This is indeed comparable to the conclusion made with regard tothe interaction between clear speech and stress in English andCroatian in Smiljanić and Bradlow (2008a)—i.e., ‘‘a major featureof clear speech production (and a source of its increased intellig-ibility) is at the level of prosodic structure’’ (p. 108), which wasbased on their finding that clear speech effects were morerobustly manifested in stressed syllables in both English andCroatian. This is also similar to the interaction effects betweenstress and focus reported in de Jong (2004), which showed thatfocus effects are mediated by stress in English, such that thevowel duration difference due to the following voicing contrast isenhanced under focus, but the effect is localized in stressedsyllables. The heightened phonetic clarity of segments that arisesat the prosodic boundary and with prominence is likely to helplisteners with prosodic parsing, which in turn facilitates speechcomprehension (Cho et al., 2007; Christophe et al., 2004, Cutler &Butterfield, 1992; Gow et al., 1996; Pitt & Samuel, 1990).

    In short, the communicatively driven hyper-articulation can betaken to be prosodically modulated, in such a way that speakershyper-articulate segments in selective and economical ways,taking into account the role of positions of segments in a givenprosodic structure, which in turn increases effectiveness of speechcomprehension on the listener’s part. Note that this also has abroader implication for the H&H theory (Lindblom, 1990). In theoriginal version of the H&H theory, the economy of articulatoryeffort was discussed as a speaker-oriented driving force for hypo-articulation in a casual communicative situation. We propose thatthe general principle of gestural economy can also apply to thecommunicative situation, which requires hyper-articulation (in aclear speech model), so that all else being equal, speakers makemore efforts to strengthen prosodically important segments fromwhich listeners would benefit more, while avoiding undueexpenditure of effort by not putting too much effort to hyper-

    articulate segments in prosodically less important locationswhose strengthening would not give rise to the same degree ofcommunicative effectiveness for a given amount of effort.5

    4.5. Issues on the role of language-specific prosodic/phonological

    structure on hyper-articulation

    The present study also aimed to explore how language-specificprosodic and phonological system would influence the way thatthe hyper-articulation effect, whether communicatively or proso-dically driven, is phonetically realized. In this section, we discussthe results that bear directly and indirectly on this issue.

    4.5.1. Relationship between the size of the phonological inventory

    and the degree of vowel space expansion

    One of the questions discussed at the outset of the paper wasabout the relationship between the size of the phonological vowelinventory and the degree of vowel space expansion in hyper-articulation environments. Given that Korean has a relatively smallvowel inventory size with 7 contrastive vowels, and also given thatthe three test vowels /i,a,u/ are positioned separately in each sectionin the vowel space (high front, low central, and high back sections,respectively for /i,a,u/) with no adjacent vowel in that section, theprinciple of effort minimization (gestural economy) (cf. Liljencrants& Lindblom, 1972; Lindblom, 1986, 1990; Bradlow, 1995, 1996)would lead to no substantial vowel space expansion even in asituation where hyper-articulation is called for in Korean. We found,however, substantial expansion of the vowel space in all three typesof hyper-articulation conditions. Although no decisive conclusioncould be made until Korean data can be directly compared with dataof other languages which have more crowded vowel spaces, theresult, especially regarding the clear speech-induced vowel spaceexpansion, is in line with the cases with English and Croatian(Smiljanić & Bradlow, 2005). It appears that speakers strive tomaximize perceptual contrasts between vowels as much as possiblein adverse communicative situations, overriding the general princi-ple of effort minimization, which would otherwise function in not-so-adverse communicative situations. The vowel space expansionthat arises especially with clear speech can therefore be thought ofas one of the common strategies employed by speakers acrosslanguages to enhance their intelligibility, independently of vowelinventory size.

    4.5.2. The scope of boundary-induced strengthening

    Another question that is related to language-specificity washow far the boundary-induced strengthening effect can beextended beyond the segments immediately adjacent to theboundary, especially to the right of the prosodic boundary inKorean. Studies documented in the literature have so far shownmixed results on this (Barnes, 2002; Byrd, 2000; Byrd et al., 2006;Cho, 2006, 2008; Cho & Keating, 2001, 2009; Cho & McQueen,2005; Fougeron & Keating, 1997; Keating et al., 2003; Krivokapić,2007), but with a general consensus that domain-initial strength-ening is not robust after the initial consonant. The present study

  • 6 Given the greater mouth opening associated with domain-final /a/ as

    reflected in lower F1, one might expect that /a/ would be louder domain-finally

    than domain-medially, as is predicted by sonority expansion (cf. Beckman et al.,

    1992; de Jong, 1995). Interestingly, however, we found the opposite—i.e., the peak

    intensity of the final vowel /a/ was weaker in IP-final position than in IP-medial

    position. This, however, can be accounted for in terms of respiratory declination.

    In general, F0 and acoustic amplitude tend to decline gradually over the course of

    utterances as a result of declination of the subglottal (pulmonary) air pressure

    (Gelfer, 1987; Gelfer, Harris, & Baer, 1987; ‘t Hart, Collier, & Cohen, 1990; Ladd,

    1984). Given that the declination of subglottal and laryngeal articulation is

    generally independent from the supralaryngeal declination (Krakow, Bell-Berti,

    & Wang, 1994), the domain-final vowel may well be strengthened in terms of

    supralarygneal articulation (as reflected in higher F1 for /a/), but weakened in its

    loudness due to the general respiratory declination.

    T. Cho et al. / Journal of Phonetics 39 (2011) 344–361358

    however showed that domain-initial strengthening in Korean isindeed extended into the vowel after the initial consonant asreflected in extreme F1 and F2 values and its accompanying vowelspace expansion, and even beyond the first syllable as reflected inthe increased V2 intensity and the longer second syllable dura-tion. This suggests that as far as Korean is concerned, the scope ofdomain-initial strengthening is at least as large as the initialsyllable, possibly including the second syllable (i.e., the entirebisyllabic test word). This extension of the domain-initialstrengthening effect in Korean is clearly different from what hasbeen found with English (e.g., Cho & Keating, 2009; Fougeron &Keating, 1997), which showed no robust evidence on the domain-initial strengthening effect beyond the initial consonant.

    The apparent cross-linguistic difference between Korean andEnglish prompts the question: What factors would influencedetermination of its domain of influence? Although these ques-tions cannot be answered directly by the experimental findings ofthe present study, a possible determining factor that can bethought of is language-specific prosodic systems. Given thepossibility that the lack of initial strengthening effects onthe postconsonantal vowel in English is attributable to the role ofthe nucleus vowel which, in English, is arguably reserved for thestress-induced prominence realization (Barnes, 2002), our resultssuggest that languages without lexical stress and pitch accent suchas Korean are associated with more robust domain-initial strengthen-ing effects as its domain of influence is not restricted by the lexicalprominence system (cf. Keating et al., 2003).

    These results also have implications for the prosodic boundarygesture model of Byrd and Saltzman (Byrd, 2000, 2006; Byrd et al.,2000, 2006; Byrd & Saltzman, 2003; Saltzman, 1995). They proposedthat the boundary influence on articulation can be understood as aresult of the influence of so-called ‘p-gesture’ that is governed byprosodic constituency in the task dynamics model (e.g., Saltzman &Munhall, 1989). It is an abstract ‘prosodic’ gesture which determinesarticulatory movement speed by modulating the rate of the clock,which in turn controls articulatory activation of constriction gestures.The rate of the clock is assumed to decrease in proportion tothe boundary strength, and as a result the articulatory movementat the juncture becomes slower, and possibly spatially larger. In thetemporal domain, the p-gesture is anchored at a prosodic boundary,such that its clock-slowing effect is stronger at the juncture, dwind-ling farther from the edge. Although it appears extremely complicatedto understand the complex results of the present study in terms ofthe local clock-slowing mechanism currently proposed in the theoryof p-gesture, to the extent that the theory works, the results of thepresent study suggest that the scope of the influence of the p- gesturemay vary from language to language, reflecting the language-specificprosodic system. For example, some languages such as Korean mayoperate on a larger scope when their prosodic structure is signaledprimarily by boundary marking without restrictions associated withlexical stress and pitch accent, while languages such as English mayhave a reduced scope when the boundary marking is restricted by thesystem for prominence marking (cf. Barnes, 2002; Cho & Keating,2009), limiting initial strengthening effects to the edges of prosodicdomains. Much more work appears to be needed to determine howthe theory of p-gesture explains the complicated patterns that arisewith interactions of boundary and prominence effects and how itsscope is modulated by language-specific prosodic systems.

    The cross-linguistic differences in prosodic systems alsoprompt another important question about whether the boundaryeffect to the left (on the domain-final vowel) in Korean gives riseto more robust strengthening effects as compared withEnglish. Beckman et al. (1992) found that in English />/ in ‘‘pop,opposing’’ (domain-final position) was more lengthened com-pared with />/ in ‘‘poppa, posing’’ (domain-medial position), but aspatial effect (e.g., jaw lowering) did not come with the final

    len