ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G....

222
DOCUMENT RESUME ED 321 318 CS 507 209 AUTHOR Studdert-Kennedy, Michael, E'. TITLE Status Report on Speech Research, July-December 1989. INSTITUTYN Haskins Labs., New Haven, Conn. SPONS AGENCY National Inst. of Child Health and Human Development (NIH), Bethesda, Md.; National Inst. of Health (DHHS), Bethesda, MD. Biomedical Research Support Grant Program.; National Inst. of Neurological and Communicative Disorders and Stroke (NIH), Bethesda, Md.; National Science Foundation, Washington, D.C. REPORT NO SR-99/100 PUB DATE 89 CONTRACT NO1-HD-5-2910 GRANT BNS-8520709; HD-01994; NS07237; NS13617; NS13870; NS18010; NS24655; NS25455; RR-05596 NOTE 221p.; For the January-June 1989 status report, see CS 507 208. PUB TYPE Collected Works - General (020) -- Reports - Research /Technical (143) EDRS PRICE MF01/PC09 Plus Postage. DESCRIPTORS Auditory Perception; Communication Research; *language Processing; Language Research; Orthographic Symbols; *Phonology; Reading; *Speech; *Speech Communication IDENTIFIERS Speech Perception; Speech Research; Vocalization ABSTRACT One of a series of semiannual reports, this publication contains 12 articles which report the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. The titles of the articles and their authors are as follows: "Coarticulatory Organization for Lip-rounding in Turkish and English" (Suzanne E. Boyce); "Long Range Coarticulatory Effects for Tongue Dorsum Contact in VCVCV Sequences" (Daniel Recasens); "A Dynamical Approach to Gestural Patterning in Speech Prcduction" (Elliot L. Saltzman and Kevin G. Munhall); "Articulatory Gestures as Phonological Units" (Catherine P. Browman and Louis Goldstein); "The Perception of Phonetic Gestures" (Carol A. Fowler and Lawrence D. Rosenblum); "Competence and Performance in Child Language" (Stephen Crain and Janet Dean Fodor); "Cues to the Perception of Taiwanese Tonel.' (Hwei-Bing Lin and Bruno H. Repp); "Physical Interaction and Association by Contiguity in Memory for the Words and Melodies of Songs" (Robert G. Crowder and others); "Orthography and Phonology: The Psychological Reality of Orthographic Depth" (Ram Frost); "Phonology and Reading: Evidence from Profoundly Deaf Readers" (Vicki L. Hanson); "Syntactic Competence and Reading Ability in Children" (Shlomo Benttn and others); and "Effect of Emotional Valence in Infant Expressions upon Perceptual Asymmetries in Adult Viewers" (Catherine T. Best and Heidi Freya Queen). An appendix lists DTIC and ERIC numbers for publications in this series since 1970. (SR)

Transcript of ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G....

Page 1: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

DOCUMENT RESUME

ED 321 318 CS 507 209

AUTHOR Studdert-Kennedy, Michael, E'.TITLE Status Report on Speech Research, July-December

1989.

INSTITUTYN Haskins Labs., New Haven, Conn.SPONS AGENCY National Inst. of Child Health and Human Development

(NIH), Bethesda, Md.; National Inst. of Health(DHHS), Bethesda, MD. Biomedical Research SupportGrant Program.; National Inst. of Neurological andCommunicative Disorders and Stroke (NIH), Bethesda,Md.; National Science Foundation, Washington, D.C.

REPORT NO SR-99/100PUB DATE 89

CONTRACT NO1-HD-5-2910GRANT BNS-8520709; HD-01994; NS07237; NS13617; NS13870;

NS18010; NS24655; NS25455; RR-05596NOTE 221p.; For the January-June 1989 status report, see

CS 507 208.PUB TYPE Collected Works - General (020) -- Reports -

Research /Technical (143)

EDRS PRICE MF01/PC09 Plus Postage.DESCRIPTORS Auditory Perception; Communication Research;

*language Processing; Language Research; OrthographicSymbols; *Phonology; Reading; *Speech; *SpeechCommunication

IDENTIFIERS Speech Perception; Speech Research; Vocalization

ABSTRACT

One of a series of semiannual reports, thispublication contains 12 articles which report the status and progressof studies on the nature of speech, instrumentation for its

investigation, and practical applications. The titles of the articlesand their authors are as follows: "Coarticulatory Organization for

Lip-rounding in Turkish and English" (Suzanne E. Boyce); "Long RangeCoarticulatory Effects for Tongue Dorsum Contact in VCVCV Sequences"(Daniel Recasens); "A Dynamical Approach to Gestural Patterning inSpeech Prcduction" (Elliot L. Saltzman and Kevin G. Munhall);"Articulatory Gestures as Phonological Units" (Catherine P. Browmanand Louis Goldstein); "The Perception of Phonetic Gestures" (Carol A.Fowler and Lawrence D. Rosenblum); "Competence and Performance inChild Language" (Stephen Crain and Janet Dean Fodor); "Cues to thePerception of Taiwanese Tonel.' (Hwei-Bing Lin and Bruno H. Repp);"Physical Interaction and Association by Contiguity in Memory for theWords and Melodies of Songs" (Robert G. Crowder and others);"Orthography and Phonology: The Psychological Reality of OrthographicDepth" (Ram Frost); "Phonology and Reading: Evidence from ProfoundlyDeaf Readers" (Vicki L. Hanson); "Syntactic Competence and ReadingAbility in Children" (Shlomo Benttn and others); and "Effect ofEmotional Valence in Infant Expressions upon Perceptual Asymmetriesin Adult Viewers" (Catherine T. Best and Heidi Freya Queen). Anappendix lists DTIC and ERIC numbers for publications in this seriessince 1970. (SR)

Page 2: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders
Page 3: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

HaskinsLaboratories

Status Report on

Speech Research

"PERMISSION TO REPRODUCE THISMATERIAL HAS BEEN GRANTED BY

A. "bitIDOOR 1 AQ

TO THE EDUCATIONALRESOURCES

INFORMATION CENTER (ERIC)"

SR-99/100JULY-DECEMBER 1989

U S DEPARTMENT OP EDUCATIONOffice or Educahonal Research and Improvement

EDUCATIONAL RESOURCES INFORMATIONCENTER (ERIC)

1:1 This Poole, nt hell been reproduced asrecton from the person or orgarszahonongna ,'1.MtnOr e npeli hive been made to improvereproch....hon Quality

Pomts of Y Woe Of OchnOnS staled in hos docu-ment do not neceSsenty repnr,ent OltiCallOERI pOeittOn 0, 00110

Page 4: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

HaskinsLaboratorics

Status Report on

Speech Research

SR-99/101'JULY-DECEMBER 1989 NEW HAVEN, CONNECTICUT

Page 5: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Distribution statement

EditorMichael Studdert-Kennedy

Production StaffYvonne Manning

Zefang Wang

This publication reports the status and progress of studies on the nature of speech,instrumentation for its investigation, and practical arolications.

Distribution of this document is unlimited. The information in thisdocument is availableto the general public. Haskins Laboratories distributes it primarily for library use. Copiesare available from the National Technical Information Service or the ERIC DocumentReproduction Service. See the Appendix for order numbers ofprevious Status Reports.

Correspondence concerning this repor, shnuld be addressed to the Editor at the addressbelow:

Haskins Laboratories270 Crown StreetNew Haven, Connecticut 06511-6695

This Report was reproduced on recycled paper

ii SR- 99/100 July-December 1989

4

Page 6: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Acknowledgment

The research reported here was made possible in part by support from the following sources:

National Institute of Child Health and Human Development

Grant HD-01994Contract NO1-HD-5-2910

National Institutes or Health

Biomedical Research Support Grant RR-05596

National Science Foundation

Grant BNS- 8520709

National Institute of Neurological and CommunicativeDisorders and Stroke

Grant NS 13870Giant NS 13617Grant NS 18010Grant NS 24655Grant NS 07237Grant NS 25455

SR- 99/100 July-December1989

r5

iii

Page 7: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Investi tors Technical/Administrative St.df

Arthur AbramsonPeter J. AlfonsoThomas BaerEric BatesonFredericka Bell-BertiCatherine T. Best*Susan Brady*Catherine P. BrowmanFranklin S. Cooper*Stephen CrainRobert Crowder*Lois C.Alice FaDb:ereLaurie B. FeldmanJanet Fodor*Anne FowlerCarol A. Fowler'Louis GoldsteinCarol GraccotVincent GraccoVial L HansonKatherine S. HarrisLeonard KateRena Arens KrakowAndrea G. LevittAlvin M. LibermanIsabelle Y. LibermanDiane Lilt-MartinLeigh Lisker'Anders LefqviseVirginia H. MannIgnatius C. Mattingly*Nancy S. McGarr*Richard S. MrCowanPatrick W. NyeLawrence J. RaphaelBruno H. ReppPhilip E. RubinElliot SaltzmanDonald ShankweilseMichael Studdert-KennedyKoichi Tsunoda'Michael T. Turvey*Douglas Whalen

Part-timetNRSA Training Fellow'Visiting from University of Tokyo, Japan

Philip ChagnonAlice DadourianMichael D'AngeloBetty J. DeLiseVincent GulisanoDonald HalleyRaymond C. Huey*Marion MacEachronYvonne ManningJoan MartinezLisa MazurWilliam P. ScullyRichard S. SharkanyZefang WangEdward R. Wiley

Students'

Jennifer AydelottDragana BaracPaola BellabarbaMelanie CampbellSandra ChiangAndre CooperMargaret Hall DarinTerri ErwinElizabeth GoodellJoseph KalinowskiDeborah KuglitschSimon LevyKatrina LukatelaDiana MatsonGerald W. McRobertsPamela MermelsteinSalvatore MirandaMaria ModyWeipa NiMira PeterNicnii RenChristine RomanoArlyne RussoRichard C. SchmidtJeffrey ShawCaroline SmithJen lifer SnyderRobin Seider StoryRosalind ThorntonMark TiedeQi WangYi Xu

iv SR-991100 July-December 1989

6

Page 8: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Contents

Coarticulatory Organization for Lip-rounding in Turkish and EnglishSuzanne E. Boyce

Long Range Coarticulatory Effects for Tongue Dorsum Contact in VCVCVSequences

Daniel Recasens 19

A Dynamical Approach to Gestural Patterning in Speech ProductionElliot L Saltzman and Kevin C. Munhall 38

Articulatory Gestures as Phonological UnitsCatherine P. Brow nan and Louis Goldstein 69

The Perception of Phonetic GesturesCarol A. Fowler and Lawrence D. Rosenblum 102

Competence and Performance in Child LanguageStephen Crain and Janet Dean Fodor 118

Cues to the Perception of Taiwanese TonesHweiBing Lin and Bruno H. Repp 137

Physical Interaction and Association by Contiguity in Memory for the Wordsand Melodies of Songs

Robert C. Crowder, Mary Louise Serafine, and Bruno H. Repp 148

Orthography and Phonology: The Psychological Reality of OrthographicDepth

Ram Frost 162

Phonology and Reading: Evidence from Profoundly Deaf ReadersVicki L. Hanson 172

Syntactic Competence and Reading Ability in ChildrenShlomo Bentin, Avital Deutsch, and Isabelle Y. Liberman 180

Effect of Emotional Valence in Infant Expressions upon PerceptualAsymmetries in Adult Viewers

Catherine T. Best and Heidi Freya Queen 195

Appendix 211

SR-991100 July-December 1989

7

Page 9: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Status Report on

Speech Research

Page 10: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Haskins Laboratories Status Report on Speech Research1989, SR- 99/100,1 -1C.

Coarticulatory Organization for Lip-rounding inTurkish and English*

Suzanne E. Boycet

INTRODUCTIONTheories of coarticulation in speech have taken

as an axiom the notion that, by coarticulatingsegments, a speaker is aiding the efficiency of hisor her production (Liberman & Studdert-Kennedy,1977). Although discussion of the forces affectingcoarticulation has tended to concentrate onarticulatory and/or perceptual pressures operatingwithin particular sequences of segments(Beckman & Shoji, 1984; Martin & Bunnell, 1982;Ohala, 1981; Recasens, 1985), there is a growingbody of cross-linguistic work exploring theinfluence of language-particular phonologicalstructure on coarticulation (Keating, 1988; Lubker& Gay, 1982; Magen, 1984; Manuel, 1990; Ohman,1966; Perkell 1986, among others). These studieshave generally been concerned with theinteraction of coarticulation and segmentinventory, or coarticulation and the properties o;some particular segment; the question of howcoarticulatior interacts with phonological ruleshas been relatively neglected (but cf. Cohn, 1988).Phonological rules, for instance, determine thetypical structure of words in a language; we mightspeculate that languages with differentconstraints on the possible sequencing ofsegments pose different challenges to thearticulatory planner, and thus that speakers ofthese languages would vary in the way theyimplement coarticulation. To take an example,speakers of Turkish, a vowel harmony languagewith strict rules for the possible sequencing of

The resea.ch in this paper was supported by NTH grantsNS-13617 and BRS RR-05696 to Haskins laboratories.Suggestions, comments and criticism supplied by KatherineHarris, Louis Goldstein, Michael Studdert-Kennedy, IgnatiusMattingly, Frederick,. Bell-Borti, Sharon Manuel, RenaKrakow, Marie Huffman, Joe Perkell and John Westbury, andthe JASA review process are grateiWly acknowledged. Adviceon Turkish linguistics was provided by Jaklin Kornfolt andEngin Sezer.

2

f

rounded and unrounded vowels, might feel morepressure to employ rounding coarticulation thanEuglish speakers, whose language freely combinesrounded and unrounded vowels.

Rounding coarticulation for sequences ofrounded and unrounded vowels in English hasbeen extensively studied. A number of studieshave shown that for strings of two rounded vowelsseparated by non-labial consonants, e.g., /utu/ or/ustu/, both EMG and lip protrusion movementtraces show double peaks coincident with the tworounded vowels plus an intervening dip or troughin the signal (Engstrand, 1981; Gay, 1978;MacAllister, 1978; Perkell, 1986). (A schematizedversion of this pattern, representing EMG fromthe orbicularis oris muscle for the utterance /utu/,is illustrated in Figure 1.) Tnis result has been thefocus of a good deal of controversy in recent years,primarily because different theories ofcoarticulation tend to treat it in different ways.For instance, much previous work on the controlmechanisms underlying anticipatory coartic-ulation has centered on the predictions of oneclass of models, the look- ahead" or "feature-spreading" models (Benguerel & Cowan, 1974;Daniloff & Moll, 1968; Henke, 1966). Generally,these models view coarticulation as the migrationof features from surruunding phones. In the mostexplicit form of this type of model, Henke's (1966)computer implementation of articulatorysynthesis, the articulatory planning componentscans upcoming segments and implementsfeatures as soon as preceding articulatorilycompatible segments make it possible to do so.1 Inthe case of /utu/ or /ustu/, the non-labialconsonants separating the vowels are made withthe tongue and presumably do not conflict withsimultaneous lip-rounding. Thus, the fact thattroughs occur is a problem for the look-aheadmodel, because the model would normally predictthat the rounding feature for the second vowel

9

Page 11: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

2 &yce

would spread onto the preceding consonant,producing a continuous plateau of rounding fromvowel to vowel.

Figure 1. Schematized version of "trough" pattern,representing EMG from the orbicularis muscle for theutterance hint/.

Explanations of the trough results have variedwidely. Following a more general suggestion ofKozhevnikov and Chistovich (1965), Gay (1978)proposed that the trough represents the resettingof a "syllable - sized articulatory unit,* and thatcoarticulation is allowed to take place only withinthat unit. Some evidence against this explanationwas provided by Harris and Bell-Berti (1984), whofound no sign of a trough in sequelces such as/uhu/ and /u2uJ. Addree sing himself to thesequences typically used in these experiments,Engstrand (1981) took issue with the assumptionthat alveolar consonants are compatible with fulllip-rounding. He argued instead that rounding asfound in the vowel hi/ may interfere with optimalacoustic/aerodynamic conditions for theseconsonants, and that the trough may result fromlip movement towards a less-roundedconfiguration. That such acoustic/aerodynamicconstraints may not hold for all subjects wasshown by Gelfer, Bell-Berti and Harris (1989),who reported data from a subject with lipprotrusion and EMG Orbicuiaris Oris Inferior(00I) activity for /t/. Perkell (1986) hypothesizedthat a diphthongal pattern of movement for the /u/vowels (i.e., from a less to a more extreme lipposition), might, in addition to acoustic and otherconstraints on the consonants, reduce the extentof rounding in the vicinity of the intervocalicconsonant(s). In his own work, however, he foundlittle evidence for diphthongal behavior in thosesubjects who showed troughs.

Each of these proposals, it should be noted, canbe seen as a modification to the look-ahead class ofmodels, in which features of one segment spreadto another segment if context conditions allow it.Alternatively, a class of models known as"coproduction," "frame" or "time- locking" models(Bell-Beds & Harris, 1981; Fowler 1980) assumes

that coarticulation results from temporal overlapbetween independent articulatory gesturesbelonging to neighboring segments. In thesemodels, lip movement for the the utterance /utu/involves two overlapping rounding gestures(assuming IfJ is not independently rounded). Thus,the presence of a trough is controlled by thedegree of overlap between gesture peaks. If thepeaks overlap one another, no trough will bediscernible, but if the peaks are temporallyseparated from one another, the model predictsthe occurrence of a trough. Another provision ofthese models is that gestures associated with aparticular segment should show a stable profileacross different segmental contexts. (It isacknowledged that characteristk gesture profilesmay be afrated by stress and possibly otherprosodic contexts (Tuner, Kelso, & Harris, 1982).Thus, the temporal 3xtent of coarticulation ispredicted by the temporal extent of the gesture.Attempts to test the latter provision of this modelby measuring the lag times between the acousticonset of rounded vowels and related articulator,activity have had varied and sometimesconficting results, with studies by Bell-Berti andHarris (1979, 1982) and Engstrand (1981)supporting the coproduction prediction of stablelag times, and studies by Lubker (1981), Lubkerand Gay (1982), Sussman and Westbury (1981)and Perkell (1986) indicating more variablebehavior supportive of the look-ahead view. Thus,although it is unclear how the look-ahead class ofmodels can account for the trough pattern, bothtypes of models remain viable options forexplaining coarticulation.

Regardless of which interpretation of the troughpattern is correct, however, the pattern itself hasbeen found in each of the languages so farsurveyed, appearing in English (Bell-Berti &Harris, 1974; Gay, 1978; Perkell, 1986), Swedish(Engstrand, 1981; McAllister, 1978), Spanish andFrench (Perkell, 1986). It is notable that :,gieselanguages, while differing in such variables assyllable structure, the tendency to diphthongizevowels, and the presence of a phonologicalcontrast between rounded and unrounded vowels,are alike in their tolerance for mixed sequences ofrounded and unrounded vowels. It seemedplausible, at least, that the finding of troughs inlip-rounding activity for these languages might berelated to this tolerance, and that a language likeTurkish, in which words with mixed rounded andunrounded vowels are the exception, might showlip-rounding patterns other than the troughpattern. In particular, it seemed that Turkish

10

Page 12: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Coartiadatory Organization for Lip-rounding in Turkish and English

might provide particularly favorable conditions foranticipatory coarticulation of the kind predictedby the look-ahead class of models. In brief, thehypothesis was that Turkish speakers wouldexhibit plateau patterns of activity for lip-rounding. The experiment described below waspart of a larger study designed to test thishypothesis (Boyce, 1988). A second aim of theexperiment was to test the degree to which thecoproduction model's explicit prediction of stable.independent gestures could be used to predict bothEnglish and Turkish movement patterns.

EXPERIMENTFour speakers of American English and four

speakers of Standard Turkish produced similarlystruftured nonsense words designed to show thepresence or absence of troughs in lip-rounding.Corpus words for this purpose consisted of theseries /kuktluk/, /kuktuk/, /kukuk/, / kutuk/,/kuluk/. Because arguments concerning the troughpattern often hinge on questions concerning theproduction of the intervocalic consonants in anunrounded environment (Benguerel & Cowan1974; Gelfer, Bell-Berti, & Harris 1989), the words/kiktlik/, Tgaiktik/, /kikild, /kitik/, /kilik/, wereincluded as controls. Additionally, words withrounded vowels followed by unrounded vowels/kuktlik/, /kuktik/, /kukik/, /kutik/, and /kulik/, andwords with unrounded vowels followed by roundedvowels /kiktluk/, Aciktuk/, /kikuk/, /kituk/, and/kiluk/ were included to provide data on singleprotrusion movements. In the remainder of thepaper, words with vowel sequences u-u, pc, etc. willbe referred to as u-u, Pr, u -i, and c-u words. Thewords with intervocalic wit, which had the longestvowel-to-vowel intervals, were included to providethe clearest test case for the presence of a troughpattern. Words with shorter intervocalicconsonant intervals were included to providecontrol information on the lip activity patterns fordifferent consonants. The carrier phrase forTurkish speakers was "Bir dahs deyiniz"(pronounced as phonetically spelled and meaning'Say once again'). The English carrierphrase was "Its a again."

English-speaking subjects included one male(AE) and three females (MB, AF and NM), each ofwhom spoke a variety of General American withno marked regional or dialectal accent. TheTurkish speakers included one female (IB) andthree males (AT, EG and CK). All spoke similarvarieties of Standard Turkish.

Additional facts about Turkish which impingeon the arguments made in this paper have been

summarized in the Appendix. For the present, it issufficient to note that rounding in Turkishoperates According to a vowel harmony rule which,in essence, causes sequences of high vowels toacquire the rounding specification of the precedingleftmost vowel. (With minor exceptions,consonants do not participate in this process.) Theeffect is to produce long strings of rounded orunrounded vowels whose rounding is predictablegiven the first vowel in the sequence. While vowelharmony is a productive rule for the vast bulk ofthe lexicon there are numerous exceptions, mainlyfrom Arabic and Persian borrowings. Real wordcounterparts exist for each of the vowel sequencesin the experimental corpus, although u -i and i -iwords conform to vowel harmony while i -u and u-cwords do not.

For Turkish subject EG, utterances wererandomi,ed and the randomized list repeated 15times. Utterances in later subject runs wereblocked, so that utterances were repeated ingroups of five tokens (three for MB), utteranceswith the same vowel combinations were groupedtogether, and the same order of consonantcombinations was repeated for each vowelcombination. The order of vowel and consonantcombinations was different for each subject,except that one Turkish speaker (IB) and oneEnglish speaker (AF) had the same order ofpresentation.

Although Turkish has final stress, the degree ofdifference between stressed and unstressedsyllables is much less than in English (Boyce,1978). Therefore, English speakers wereencouraged to use equal stress on both syllables ofthe disyllabic nonsense words, and if equal stressfelt unnatural, to place stress on the final ratherthan the initial syllable. Turkish subjects weregiven no instructions about stress. All subjectswere instructed to speak at a comfortable rate, ina conversational manner.

InstrumentationMovement data from the nose, upper and lower

lip, and jaw were ot1-eined by means of an opto-electrical tracking system, similar t' thecommonly used Selcom Selspot system. Thesystem consists of infrared light emitting diodes(LED's) attached to the structure of interest. LEDposition is sensed by a photo-diode within acamera positioned to capture the range of LEDmovements in its focal plane. The output of thisdiode is translated by associated electronics intopairs of X and Y coordinate potentials for eachLED, each with a maximum frequency response of

11

Page 13: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

4

500 Hz. Calibration is achieved by moving a diodethrough a known distance in the focal plane.

LED's were attached to the subject's nose, upperlip, lower lip and jaw with double-sided tape. Thenose LED was placed on the bridge of the nose,slightly to the left side, at a point determined toshow the least speech-related wrinkling, waggling,etc. LED's were placed just below the vermilionborder of the upper lip and just above thevermilion border of the lower lip, in a plane withthe nose LED, at a point judged to show the axisof anterior-posterior movement for eacharticulator. The movement of the subject's skinbetween the lower lip and chin was observedduring production of rounded vowels, and the jawLED positioned to best reflect anterior-posteriormovements of the mandible rather than 'kin andmuscle. Generally this was at the point of the chinor under it, in a plane with the higher LED's.

The LED-tracking camera was positioned at 90degrees to the left of the subject's sagittal midline,at a camera-to-subject distance (21 inches) thatprovided a 10-b. -10 inch field of view. Whencentered approximately on the upper/lower lipjunction, during maintenance of a positionappropriate for bilabial closure, this field is largeenough to capture the full range of anterior-posterior LED movement, as well as allowing forsome degree of head movement

A video camera was positioned 90 degrees tosubject midline on the subject's right, and focusedas narrowly as possible, while continuing to keepall 4 LED's within the field cf view. Five subjectswere videotaped throughout the experiment:English subjects AF and NM, and Turkishsubjects AT and IB. An additional videotape ofEnglish subject MB producing the words /kitklik/,kuktluk/, / kiktluk/ and /kuktlik/ was obtained in aseparate session.

A simultaneous audio recording of the subject'sspeech during the experiment was made on aSennheiser 'shotgun" microphone.

The EMG recordings were made with adhesivesurface silver-silver chloride electrodes. Thesewere placed just below and above the vermilionborder of upper and lower lips, laterally to themidline. According to Blair and Smith (1986), anelectrode at this location is likely to pick uprelatively more activity from orbicularis oris, andless of nearby muscles, than at other locationsalong the lip edge. Pick-up from the desiredmuscles, Orbicularis Oris Inferior (00I) andOrbicularis Oris Superior (00S), was checked byhaving the subject produce repeated /u/ or /Vvowels several times in succession; if a strong

signal was evidenced for /u/ and little or no signalfor fil, the EMG electrode was assumed to be well-placed.

The EMG and movement signals, together withaudio and clock signals, were recorded onto a 14-channel FM tape recorder (EMI series 7000). TheEMG signals were rectified, integrated over a5 ma window, and sampled et 200 Hz. Movementsignals were also sampled at 200 Hz. The audiochannel was filtered at 500 Hz and sampled at10000 Hz. By means of the simultaneous clocksignals, data from all channels were synchronizedto within 2.5 ms.

The signal from the nose LED was numericallysubtracted from respective lip and jaw signals tocontrol for changes in baseline due to headmovement. Differences in baseline between earlyand late portions of the experiment remained forsome speakers, presumably due to verticalrotational movement of the head, in which the lipsand nose, or jaw and nose, moved by differentamounts in space. The speakers most affectedwere English speaker AE, for whom the totalhorizontal lower lip baseline change wasapproximately 3 mm, and Turkish speakers IB,EG and CK, for whom the total changes wereapproximately 3.5, 6.5, and 6 mm respectively. Ineach case, baseline change reflected movement inthe posterior direction. Baseline change for otherspeakers was within 1 mm of movement.Rotational movement of this type, in which thechin sank gradually toward the base of the neck,was confirmed in the videotape for subject IB(other videotaped subjects showed little baselinechange). These baseline changes did not appear toaffect the data in any significant way.2

Because of recording or calibration problems,the upper lip movement signal and both EMGsignals for Turkish subject AT, the EMG 00Ssignal for Turkish subject EG, and the EMG 001signal for Turkish subject CK were eliminatedfrom the study. Except for AT, therefore, the fullcomplement of movement signals, and at least oneEMG signal, was available for each subject.Recording or calibration problems also causedsome of the 15 repetitions (tokens) planned forwords in the experimental corpus to be discarded.The upper lip signal level for English subject AEdeteriorated after the first block of utterances.Thus, only the first five tokens for this signal arereported.

Two acoustic reference points, or lineups, wereidentified for each token. The first, the V1 offset,was defined as the point where the formantstructure disappeared from the waveform at the

12

Page 14: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Cotrticulatory Organisation Lip-rounding in Turkish and English

onset of closure for na .-.1" IV or the point of suddenamplitude change marking the change betweenthe vowel ana the voiced approximant N. Thesecond, the V2 onset, was defined as the Telesis'. ofthe consonant occlusion for /k/ and /V, or the pointof amplitude change for N. Consonant intervalduration measurements consisted of the timebetween these two points. The audio waveform,movement and EMG signals for each repeti;ion of an utterance in the experimental corpuswere extracted into a separate computer file. Eachfile contained a 2000 ma slice of speech withconstant dimensions before and after the V1 offsetpoint.

The main body of movement data reported herecomes from the anterior-posterior upper and lowerlip signals. These signals are referred to in thetext as Upper Lip X (ULX) and Lower Lip X(...1.1). Both signals reflect lip protrusion, which isgenerally acknowledged to be the most reliablesingle index of lip rounding. However, becauserounding may also involve vertical motion of thelips, to narrow the lip aperture, and becausevertical movement and protrusion of the lower lipmay be effected by movements of the jaw,anterior-posterior jaw (JX) and inferior-superiorjaw (JY) and lip signals (ULY, LLY) wereexamined as well.

As a rule, token-to-token variability wasminimal in both movement and EMG signals.Acco-dingly, much of the presentation in thispaper is based on movement and EMG tracesproduced by ensemble averaging. (Those caseswhere token-to-token variability was greater than

KUKTLUK

,%No 1

4,0 ea. 44

r'''' , ATLL A

itsNIPse

% GOglo dr

-

5

implied by the averaged signal are mentioned in the text.) Signals were ensembleaveraged using the acoustic V1 offset as a lineuppaint.

ResultsTurkish Speakers

Movement and EMG signals were examinedseparately for Turk.sh and English subjects, witha view to determAning characteristic movementand muscle activity patterns for u-u words. Figures2 through 5 show the averaged ULX, LLX andEMG traces for /kuktluk/ and /kiktlik/ as producedby the four Turkish speakers AT, IB, EG and CK.

Overall, the /kuktluk/ movement traces for thesesubjects tended to resemble a plateau, with theprotrusion traces being flat or slightly falling overthe course of the word. Exceptions to this patternare the occurrence of a peak, or bump during theconsonant interval in the LLX traces for subjectsEG and CR, and the slight trough located at thebeginning of V2 in the ULX signal for subject EG.Out of the three Turkish speakers with EMG dea(OOS and 00I for IB, 00I for EG and OOS forCK), there was no conspicuous diminution of EMGactivity during the consonant interval. Thegeneral pattern was unimodal. For IB and CK,there was an early peak on V1 followed by a long,sustained offset and some indication of increasedactivity during V2. For cubject EG, the EMG peakwas located close to V2 in /kuktluk/ and to Vi inall other u-u words. (Movement patterns, however,were similai plateau-like over EG's different u-uwords.)3

L

K1KTUK

LL

AT

, NO Illa..I.ao. .4. I= ml, .4 ''' .4. os

I I

AUDIO

J

-400 -200 0 200 400 600 -400 -200 0 200 400 600

Figure 2. Averaged lower lip movement traces (dashed line) plus single token acoustic waveform traces, for /kuktluk/(15 tokens) and Pdkalk/ (15 tokens) as produced by Turkish speaker AT. Upwards deflection represents anteriormovement. The vertical line indicates the lineup point for ensemble averaging, which was at the acoL ltic offset of Vl.The vertical scale for the lower lip (LL) trace is 0-20 MM. Square brackrts in the lower panel indicate approximateacoustic boundaries for these words. The horizontal sma is time in ms.

13

Page 15: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

6 Boyce

400 .400 -200 0 200 400 600 400 -400 -200 0 200 400 600

Figure 3. Averaged movement (dashed line) and EMG (solid line) traces, plus single token acoustic waveform traces,for /kuktluk/ (13 tokens) and fidlctlik/ (13 tokens) as produced by Turkish speaker IB. Upwards deflection representsanterior movement. The vertical line indicates the lineup point for ensemble averaging, which was at the acousticoiled of V1. The vertical scale for both upper lip (UL) and lower lip (LL) traces it 0-20 mm. The vertical scale fire bothEMG 00$ and EMG 00! traces is 200 pm. Square brackets in the lower panel indicate approximate acousticboundaries for these words. The horizontal scale is time in ms.

KUKTLUK

EMG 009

EG

=111 M. 4= NMI GM

AUDIO

4141144-41"---444--;40

KIKTUK

II 4.. .111 am op 4m

UL4m

EMG 009

EG

-400 -200 0 200 400 600 -400 -200 0 200 400 600

Figure 4. Averaged movement (dashed line) and EMG (solid line) traces, plus single token acoustic waveform traces,for /kuktluk/ (14 tokens) and /IcikLik/ (13 tokens) as produced by Turkish speaker EG. Upwards deflection representsanterior movement. The vertical line indicates the lineup point for ensemble averaging, which was at the acousticoffset of V1. The vertical scale for both upper Rip (UL) and lower lip (LL) traces is 0-20mm. The vertical scale for EMG00! traces is 300 pm. Square brackets in the lower panel indicate approximate acoustic boundaries for these words.The horizontal scale is time in ms.

14

Page 16: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Coartic.iatory Organization for Lip-rounding in Turkish and English 7

,"

i

LL

EMO 001

1

% ft. ap ay

ILAAUDIO

1 1

KIKTUK

wo

%.14

.11.

UL

EMO OOS

a

CK

1.a.

..4 .... ast 46

1

... de.

1

LL

de e .... ma 1I. %

EIAO 001

1 1

01. =IP

I

AUDIO

ireill44"---0141fisko

-400 -200 0 200 400 SOO -400 -200 0 200 400 SOO

Figure 5. Averaged movement (dashed line) and EMG (solid line) traces, plus single token acoustic waveform traces,for /kuktluk/ (5 tokens) and /kiktlik/ (15 token!) as produced by Turkish speaker CK. Upwards deflection representsanterior movement. The vertical line indicates the lineup point for ensemble averaging, which was at the acousticoffset of V1. The vertical scale for both upper lip (UL) and lower lip (LL) traces is 0-10 mm. The vertical scale for EMG005 traces is 150 $Lv. Square brackets in the lower panel indicate approximate acoustic boundaries for these words.The horizontal scale is time in rs.

As noted in the introduction, it is well knownthat some English speakers may protrude and/ornarrow their lips for non-labial consonants, mostnotably It/ (Gelfer, Bell-Berti, & Harris, 1989) and/l/ (Brown, 1981; Leidner, 1973). Looking at the/kiktlik/ traces, it appears that Turkish subjectsAT, IB and EG do not produce significant independent protrusion during theintervocalic consonants. Although smallfluctuations in LLX signals may indicate somedegree of active lower lip protrusion, these mayalso be due to lip relaxation from a retractedposition during the flanking /i/ vowels. Thestrongest degree of movement during the /kiktlik/consonant interval is seen for subject CK. It ishard to tell if this reflects active movement ratherthan passive relaxation, however, as CK retractslip and jaw heavily for the sustained /a/ of "daha"(pronounced Walt]) in the carrier phrase (Boyce,1988). EMG traces for all subjects during m wordswere flat.

English subjects in this study could be dividedinto two groups based on the appearance of theirhorizontal lower lip signals for u-u and i-i words.(Upper lip signals were less clearly differentiated.)Examples of these patterns can be seen in Figures

6 and 7, which show the averaged ULX, LLX andEMG traces for /kuktluk/ and /kiktlik/ as producedby English subjects AE and AF. For bothspeakers, /kuktlulc/ movement traces showeddouble-peaked trough patterns. A similar double-peaked pattern can be seen for subject AFs EMG00I and OOS. For subject AE, the EMG OOStrace is clearly double-peaked. The EMG 00Itrace also shows two peaks, but with an additionalpeak between.4

The right-hand panels of Figures 5 and 6 showaveraged ULX, RLLX and EMG traces for theword /kiktlik/. As can be seen, for subject AF thereis little or no movement for either lip during theintervocalic consonant interval, and little or noactivity in the EMG signal. Some fluctuation inthe movement signal for LLX is present for AE.Comparison with the presumably neutral positionof the lips during the schwa vowel from again atthe end of the carrier phrase (between 400 and600 ms after the V1 offset point) suggests that thismay be due to lip retraction during the flankingvowels, with relaxation of the lips during theconsonant interval. Alternatively, some smallactive forward mo .emert of the lower lip and/orjaw may be involved.

15

Page 17: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

8 Boyce

AUDIO

U.

qr'two.IMO 001

AUDIO

-400 -203 0 200 400 000 -400 -200 0 200 400 000

Figure 6. Averaged movement (dashed line) and EMG (solid line) traces, plus single token acoustic waveform traces,for /kuktluk/ and /kiktlik/ as produced by English speaker AE. The vertical line indicates the lineup point forensemble averaging, which was at the acoustic offset of V1. Upwards deflection represents anterior movement. Thevertical scale for both upper Hp (UL) and lower lip (LI.) Laces is 0-20 mm. The vertical scale for both EMG OOS andEMG 001 traces is 200 Av. Square brackets in the lower panel indicate approximate acoustic boundaries for thesewords. The upper lip traces for /kuktluk/ and /kiktlik/ are a tensed from 5 tokens. The lower lip trace for /kuktluk/ isaveraged from 15 tokens, that for /idktlik/ from 13 tokens. The horizontal scale is time in ms.

AUDIO

-600 -400 -200 0 200 400 600 -600 -400 -200 0 200 400 600

Figure 7. Averaged movement (dashed line) and EMG (solid line) traces, plus single token acoustic waveform traces,for /kuktluk/ (LC tokens) and /kiktlik/ (15 tokens) as produced by English speakerAF. Upwards deflection representsanterior movement. The vertical line indicates the lineup point for ensemble averaging, which was at the acousticoffset of V1. The vertical scale for both upper lip (UL) and lower lip (LL) traces is 0-20 mm. The vertical scale for bothEMG OOS and EMG 001 traces is 500 gv. Square brackets in the lower panel indicate approximate acrosticboundaries for these words. The horizontal scale is time inms.

Is

Page 18: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Coarii culet O'io oundingin Turkish and En eishs 9

The left and right panels of Figures 8 and 9show averaged ULX, LLX and EMG traces forEnglish subjects MB and NM. Looking at the/kuktluk/ words, we see that fcr both MB and NM,the lower lip trace shows three peaks ofmovement. The first peak is located during theIt's" of the carrier phrase (at approximately 350ms before the V1 offset point) and probablyindicates protrusion associated with Isl. Thecentral, and largest, peak is located during theintervocalic consonant interval (between the twovertical lines). The third peak is located duringthe second vowel. At the same time, the EMGpatterns for MB and NM's EMG 'MI traces showtrough patterns like those of English subjects AEand AF (NM's EMG OOS trace, like AE's 001

KUKTLUK

.0. 0, 0

.160I'l 41. q."%awl I.. i...UL ME

41.,BAG 00$ .1. w 'I.

t I I

-600 -400 -200

I I I

0 200 400 600

trace, shows an additional peak after V1 offset).The upper lip pattern for subject NM alsoresembles thosc. for subjects AF and AE. SubjectMB's upper lip movement pattern contains twopeaks, which correspond roughly in time to thecentral and final peaks of the lower lip trace.

There is less apparent consistency betweenupper lip, lower lip, and EMG traces for thesesubjects than for subjects AE and AF. Looking attheir /kiktlik/ traces, however, we see that bothsubjects MB and NM show protrusion in the lowerlip signal during the consonant interval. There isalso some protrusion in MB's upper lip /kiktlik/trace. The latter is likely to reflect active forwardmovement since there is no sign of upper lipretraction on the flanking fil vowels.

KIKTUK

" ". le ea,, ... 0 wp ... OM Os

MBUL

rd' va ..."000 We "b., .4

BAG 00S

I I I

-600 -400 -200 0

I I I

200 400 600

Figure 8. Averaged movement (dashed line) and EMG (solid line) traces, plus single token acoustic waveform traces,for /kuktluk/ (15 tokens) and fltiktlik/ (15 tokens) as produced by English speaker MB. Upwards deflection representsanterior movement. The vertical line indicates the lineup point for ensemble averaging, which was at the acousticoffset of V1. The vertical scale for both upper lip (UL) and lower lip (LL) traces is 0-15 mm. The vertical scale for bothEMG OOS and EMG 001 traces is 800 p.v. Square brackets in the lower panel indicate approximate acousticboundaries for these words. The horizontal scale is time in ms.

i ; 17

Page 19: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

10 Boyce

KUKTLUK

Ilim P III?

+..or.... I

.1 I

40 qt.N

ULNM

ELIO 008.......

i I I

,..,' 4% .,.. ...40.

LLAmo ,,,,

BAG OC)I ~%

-400 -200 0 200 400 600

KIKTUK

d% %PP /. . I.. al. u11 Imil

NM

UL

... e. Ow Om .1.% ....

ENO 00S

efitleit".41144441444114".114111

r".`e ..1 ...b..* .6'I= 0

LL

...... .,4. ....% orb 'e.

SW 001

AUDIO

I I I

-400 -200 0 200 400 600

Figure 9. Averaged movement (dashed line) and EMG (solid line) traces, plus single token acoustic waveform traces,for / kuktluk/ (15 tokens) and / kiktlik/ (14 tokens) as produced by English speaker NM. Upwards deflection representsanterior movement. The vertical line indicates the lineup point for ensemble averaging, which was at the acousticoffset of V1. The vertical scale for both upper lip (UL) and lower lip (LL) traces is 0.10 :am. The vertical scale for bothEMG 00S and EMG 001 traces is 500 pv. Square brackets in the lower panel indicate approximate acousticboundaries for these words. The horizontal scale is time in ms.

In Figures 10 and 11, we see the lower lip signalfor /kiktlik/ overlaid with that for /kuktluk/ forEnglish subject MB and NM. (Baseline differencesbetween averaged traces have been adjusted whennecessary, so as to visually align carrier phraseportions of each trace.) From this, it is clear thatthe timing of the consonant interval protru.. ,npeak in /kiktlik/ is very similar to the timing ofthe central peak in these subjects' /kuktluk/

- KUKTLUK MB

i .go

KIKTUK

i I I I i I

ND

-610 -400 -200 0 200 400 600

Figure 10. Superimposed /lcuktiuk/ and /kiktlik/ avengedlower lip protrusion traces as produced by Englishsubject MB. The vertical line is V1 offset. Vertical andhorizontal scales are as in Figure 8.

traces. This is most striking for MB, whoseprotrusion peak in /kiktlik/ was also similar inamplitude to that of /kuktluk/. For NM, thecentral peak in /kuktluk/ was slightly bimodal,and the protrusion peak in /kiktlik/ is more nearlymatched in timing to the second inflection. Forboth subjects, a similar congruence of peaks canbe seen when lower lip /kiktluk/ (shown in Figures20 and 21) traces are overlaid with /kiktlik/ traces.

-400 -200 0 200 400 600

Figure 11. Superimposed /lcuktiuk/ and /kiktlik/ avengedlower lip protrusion traces as produced by Englishsubject NM. The vertical line is V1 offset. Vertical andhorizontal scales are as in Figure 9.

s

Page 20: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Coarticulatory Organization for Lip-rounding in Turkish and English 11

These observations suggest that the centralpeak of the lower lip trace for / kuktluk/ may belargely due to protrusion for one or more of theintervocalic consonants. The protrusion in MB'supper lip trace may also reflect movement forconsonants.6 The implication of these observationsis that protrusion movement during theconsonants in / kuktluk/ is independent of itsrounded vowel context. It is interesting, in thiscontext, that both EMG 00S and 00I signals for/kuktluk/ are less strong during the consonantinterval than during the rounded vowels. It seemslikely that some or all of the consonant intervalprotrusion for these signals is due to jaw activity,perhaps as a consequence of jaw raising for theconsonantal occlusions.6

SummaryThe basic question behind the experiment was

whether English and Turkish speakers wouldshow the same articulatory patterns whenproducing similar words with rounded vowelsseparated by rounding-neutral consonants. Thedata presented here indicate that they do not.Rather than showing the consistent trough-likemovement and EMG patterns exhibited byEnglish speakers AE and AF, and reported in theliterature for speakers of English, Swedish,Spanish and French, Turkish speakers show aconsistent plateau-like pattern of movement and aunimodal pattern of EMG activity (with thepossible exception of the ULX signal for subjectEG). Equally, the Turkish subjects' patterns ofmovement and EMG contrast with the multi-peaked movement pattern and trough-like EMGpatterns of English speakers MB and NM.Additionally, the latter two groups differ in thedegree of consonant-related protrusion seen in i-iutterances.

The look-ahead model, as modified byEngstrand (1981), might account for these data inthe following way: (1) for English speakers such asAE and AF, full lip protrusion (i.e., to the degreefound-in rounded vowels) is prohibited during oneor more of the intervocalic consonants used in thisstudy (sec footnote 5); (2) for English speakerssuch as MB and NM, full lip protrusion for one ormore consonants is required; (3) for Turkishspeakers lip protrusion is compatible with but notrequired during these consonants, so that thedegree of lip protrusion seen is diltated by featurespreading from the segmental context. Thus, forthe English speakers, consonants must have somephonetic feature specification associatedwithprotrusion, although this may be either plus

or minus, while for the Turkish speakersconsonants are allowed to have neutralspecification for this feature. It should be notedthat for this version of the look-ahead theory,because the context-independence of gestures isnot a theme, there is no straightforwardprediction of relationship between the u-u and i -iword data It is possible to say, for instance, thatfor speakers such as AE and AF lessenedprotrusion on consonants is a reaction to astrongly protruded environment, and the behaviorof the same consonants in an unroundedenvironment is irrelevant.

For the coproduction model, in whicharticulatory output trajectories are the result ofcombining sequences of relatively stable,independently organized gestures, data from othercontexts such as the i-t words becomes moreimportant. In this model, the fact that the centralpeak in the u-u word movement traces for Englishsubjects MB and NM has a counterpart in the pcword traces is particularly relevant, as it suggeststhat the consonant-related peak in the u-u wordtraces may be independent of the gestures for theflanking vowels. Similarly, the relative lack ofmovement in the 1-1 word traces for speakers AEand AF suggests that the trough patterns in theiru-u word traces result from combining overlappingvowel gestures with a small or non-existentconsonant gesture. For the Turkish data, on theother hand, the lack of movement associated withthe consonant(s) in the H word traces, togetherwith the lack of a trough pattern in Lie u-u wordtraces, means that a different explanation iscalled for. According to the coproduction model,there are several possibilities. First, gestures forrounded vowels in Turkish (in contrast to those forEnglish) may simply combine so as to produce aplateau pattern. This could happen, for instance,if Turkish gestures were larger or if the gesture-to-gesture interval were shorter, such that theiroverlap results in little or no trough.Alternatively, Turkish may have a differentalgorithm for combining gestures. Finally, thepeculiar phonological properties of vowel harmonymay result in successive rounded segments beingassociated with the same protrusion gesture.

The differences seen here between Turkish andEnglish are also interesting in terms of the othertheories mentioned above. For instance, if thetrough in English is assumed to be a marker ofsyllable boundary, then the plateau pattern inTurkish may be taken to indicate that Turkishdoes not mark syllable boundaries in this way.Further, Turkish vowels such as /u/ are

) 1.9

Page 21: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

12 Boyce

(reportedly) not diphthongized, so that the lack ofa trough in Turkish is compatible with adiphthongal account of the trough in English.Note, however, that for these theories the lack of atrough for English subjects MB and NM issomewhat problematic. It is necessary to assumeeither that the explanation does not apply to allEnglish speakers or that the specification ofprotrusion for the intervening consonantsobscures, in some fashion, the marking of syllableboundaries or the pattern of diphthongization.

Part IIGiven the data reported here, it is not possible

to test either the look-ahead, the syllable marker,or the diphthongization theories further. However,the (phonetic) context-free provision of thecoproduction theory makes it amenable to testingbased on articulatory behavior in differentphonetic contexts. In essence, the logic is asfollows: if articulator trajectories over severalsegments reflect the combination of gestures foreach of the segments, then it should be possible todeduce the basic shape of each gesture from itsbehavior in different contexts. It should also bepossible to synthesize articulatory contours bycombining their elements.

Accordingly, this section of the paper describes aseries of tests based on the context-free provisionof the coproduction model. In tote first test, theconsonant-related protrusion gestures seen forEnglish subjects MB and NM in t-t words aresubtracted from corresponding protrusion tracesfor u-u words. Success is a function ofcorrespondence, for the same speaker, betweensubtracted traces and other u-u word traces, suchas EMG traces, with no suggestion of consonantinterval protrusion. In other words, because thecoproduction interpretation of inconsistenciesbetween upper lip, lower lip, and EMG signals forthese speakers involver the presence of anindependent consonant-related protrusiongesture, removing the additional gesture shouldresolve the inconsistencies. In the second test, it isassumed that, if the vowel- and consonant-relatedgestures seen in the corpus are independentlyorganized, then it should be possible to construct aviable u-u word from elements in t-u and u-t words.Thus, the original protrusion traces from pu andu-i words are added together and the resultcompared to original u-u word traces. Success hereis a function of degree of correspondence betweenoriginal u-u word signals and the synthesizedversions. The use of subtraction and addition forgesture combination is based on data reported by

Lofqvist (1989), Saltzman, Rubin, Goldstein andBrowman (1987); Saltzman and Munhall (1989).

Both tests require similar intersegment timingof consonant and vowel gestures in the i -t, u-u andmixed-vowel words. Although explicit measures ofgestural timing were not made, mean intervocalicconsonant intervals (from acoustic offset of V1 toacoustic onset of V2) among one-, two- and three-consonant words varied by less than 35 ms for anyEnglish or Turkish subject This was taken asevidence that speech rate and gesture phasingwere similar enough for corresponding gestures tobe equated.

Subtractim TestThose 1-1 word signals showing protrusion in the

consonant interval consisted of upper and lowerlip movement traces for subject MB and lower liptraces for subject NM. For the first test(henceforth called the Subtraction Test), thesetraces were subtracted, point by point, fromcorresponding u-u word movement traces. Thetheory behind this procedure was that theunderlying movement during the consonantinterval, i.e., the portion of movement associatedwith the vowel gestures. would be the same forboth 1-1 and u-u words.7

Figures 12 and 13 show the results ofsubtracting averaged /kiktlik/ from averaged/kuktluk/ movement traces, superimposed onoriginal / kuktluk/ movement traces for thesesubjects. For comparison purposes, Figure 13 alsoshows the results of subtracting NM's averagedupper lip /kiktlik/ movement trace, which showedno sign of protrusion during the consonantinterval, from her averaged upper lip / kuktluk/trace.

All four subtracted traces in Figure 10 show atrough pattern. This is to be expected for theupper lip trace of subject NM, since her original/kuktluk/ traces showed a trough and her /kiktlik/trace is essentially flat. It is striking, however,that the trough patterns for both subjects' lowerlip traces, and for MB's upper lip trace, correspondmore neatly to these subjects' EMG troughpatterns (seen in Figures 8 and 9) than did theoriginal u-u traces. Further, NM's subtracted lowerlip trace is nearly identical to both her subtractedand original upper lip trace.

This result supports the hypothesis thatsubjects NM and MB have separate vowel andconsonant-related hohavior for protrusion.Further, it sugge..ts that their vowel-relatedprotrusion behavior presumably connected toarticulatory instantiation of the vowel feature of

20

Page 22: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Coartieulatory Organization for Lip-rounding in Turkish and English 13

roundingresembles that of other Englishspeakers in being trough-like. The presence of lipprotrusion during consonant articulation suggests,not a different articulatory organization, but anadditional gesture overlapping with vowel-relatedgestures. At a more general level, this result canbe taken as support for the coproduction modelnotion that gestures are independent entities andfor the notion that gesture combination isapproximately additive.

Addition TestFor the second test (henceforth known as the

Addition Test), averaged upper and lower lipmovement and word traces were addedtogether for each English and Turkish subjectBecause the result of adding i-u and wordtraces is theoretically equal to the result of addingu-u and t-i word traces, the averaged PI word traceswere then subtracted from each added traces toproduce a 'constructed* u-u trace. Figures 14 - 21show the results of this procedure for lower lipdata from /kuktluk/, /kiktlik/, /kuktlik/ and/kiktluk/ (upper lip data are substantially thesame). The top panels show averaged / kuktlik/and /kiktluk/. The traces resulting from addingthese and subtracting / kiktlik/ (henceforth knownas constructed traces) are shown in the bottompanels together with superimposed original u-utraces.

As these figures show, the constructed tracesparalleled the original traces quite closely forthree out of four English subjects, and for two outof the four Turkish subjects. For English subjectsMB and AE, in particular, the traces parallel oneanother quite closely. For Turkish subject CK theprincipal difference is the slightly lower amplitudeof the constructed trace.s For English subject NM,differences are also minimal. For Turkish subjectEG, differences are intensification of a slight"bump" existing in the original trace plus aslightly lowered amplitude of movement duringthe final /u/ vowel. For the remaining subjects,however, differences are more serious. Englishsubject AF's constructed trace shows a singlebroad peak rather than a trough as in the originaltrace. In contrast, Turkish subject IB'sconstructed trace shows a trough rather than aplateau as in the original trace. Turkish subjectAT's constructed trace, while paralleling theoriginal trace during the final vowel, has agenerally different shape from the plateau patternof the original trace.

KUKTLUK

mo ___ap

ORIGINAL

444 mip

SUBTRACTED

MB

wow 1...°

ORIGINAL

4.\ of1%0° e'SUBTRACTED~

-600 -400 -200 0 200 400 600

Figure 12. Original avenged /kuktluk/ protrusion traces(solid lines) with superimposed trace achieved bysubtracting original averaged / kiktlik/ trace fromoriginal avenged /kuktluk/ trace (dashed line), forEnglish subject MB. Upper panel shows original andsubtracted traces for the upper lip, lower panel showsthe same for the lower lip. The vertical line is the VIoffset. Vertical and horizontal scales are as in Figure 8.

ORIGINAL

SUBTRACTED

NM

.1111111

ORIGINAL

%I M M ti PSUBTRACTED

-400 -200 0 200 400 600

Figure 13. Original averaged /kuktluk/ protrusion traces(solid line) with superimposed trace made bysubtracting original averaged /kiktlik/ trace fromoriginal averaged /kuktluk/ trace (dashed line), forEnglish subject NM. Upper panel shows original andsubtracted traces for the upper lip, lower panel showsthe same for the lower lip. The vertical line is the VIoffset. Vertical and horizontal scales are as in Figure 9.

21

Page 23: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

14 Boyce

-400 -200 0 200 400 600

Figure 14. Upper panel shows averaged lower lipprotrusion traces for /kuktlik/ and /kiktluk/ as producedby Turkish subject AT. Lower panel shows originalaveraged protrusion trace for /kuktluk/ (solid line) withsuperimposed trace constructed by adding averagedtraces for /kuktlik/ and /kiktluk/ and subtractingaveraged trace for /kiktlik/ (dashed line). The verticalline is the V1 offset. Vertical and horizontal scales are asin Figure 2.

KUKTUK

*ft we

KIKTLUK 113

CONSTRUCTED"ell, ib

ORIGINAL

-400 -200 0 200 400 600

Figure 15. Upper panel shows averaged lower lipprotrusion traces for /kuktlik/ and /kiktluk/ as producedby Turkish subject IB. Lower panel shows originalaveraged protrusion trace for /kuktluk/ (solid line) withsuperimposed trace constructed by adding averagedtraces for /kuktlik/ and /kiktluk/ and subtractingaveraged trace for /kiktlik/ (dashed line). The verticalline is the V1 offset. Vertical and horizontal scales are asin Figure 3.

KUKTUK KIKTLUKado

EG

-400 -200

ORIGINAL

CONSTRUCTED

0 200 400 600

Figure 16. Upper panel' shows averaged lower lipprotrusion traces for /kuktlikl and /kiktluk/ as producedby Turkish subject EG. Lower panel shows originalaveraged protrusion trace for /kuktluk/ (solid line) withsuverimposed trace constructed by adding averaged',aces for /kuktlik/ and /kiktluk/ and subtractingaveraged trace for /kiktlik/ (dashed line). The verticalline is the V1 offset. Vertical and horizontal scales are asin Figure 4.

ORIGINAL

-400 -200 0 200 400 600

Figure 17. Upper panel shows averaged lower lipprotrusion traces for /kuktlik/ and /kiktluk/ as producedby Turkish subject CK. Lower panel shows originalaveraged protrusion trace for /kuktluk/ (solid line) withsuperimposed trace constructed by adding averagedtraces for /kuktlik/ and /kiktluk/ and subtractingaveraged trace for /kikilik/ (dashed line). The verticalline is the V1 offset. Vertical and horizontal scales are asin Figure 5.

22

Page 24: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Coarticulatory Organization for Lip- rounding in Turkish and English

WM

KUKTUK

1 i

..-° AE.#' KIKTLUK 4

I i

ORIGINAL

1

-400 -200 0 200 400 600

Figure 18. Upper panel shows averaged lower lipprotrusion traces for /kuktlik/ and /kiktluk/ as producedby English subject AE. Lower panel shows originalaveraged protrusion trace for /kuktluk/ (solid line) withsuperimposed trace constructed by adding averagedtraces for /kuktlik/ and /kiktluk/ and subtractingaveraged trace for /kiktlik/ (dashed line). The verticalline is the V1 offr2t. Vertical and horizontal scales are asin Figure 6. .

KUKTLIK

Iwo ...6. ,,

1 i 1

KIKTLUKMB

=lb .1mo mio

1 i

N.... ..,

1

CONSTRUCTED

/ sr

011 va.0r

1

ORIGINAL

I I I

1 0 .1

go

I I I

600 -400 -200 0 200 400 600

Figure 20. Upper panel shows averaged lower lipprotrusion traces for /kuktlik/ and /kiktluk/ as producedby English subject MB. Lower panel shows originalaveraged protrusion trace for /kuktluk/ (solid line) withsuperimposed trace constructed by adding averagedtraces for /kuktlik/ and /1dktluk/ and subtractingaveraged trace for /kiktlik/ (dashed line). The verticalline is the V1 offset. Vertical and horizontal scales are asin Figure 8.

-400 -200 0 200 400 600

Figure 19. Upper panel shows averaged lower lipprotrusion traces for /kuktlik/ and /kiktluk/ as producedby English subject AF. Lower panel shows originalaveraged protrusion trace for /kuktluk/ (solid line) withsuperimposed trace constructed by adding averagedtraces for /kuktlik/ and /kiktluk/ and subtractingaveraged trace for /kiktliki (dashed line). The verticalline is the V1 offset. Vertical and horizontal scalesare asin Figure 7.

-400 -200 0 200 400 600

Figure 21. Upper panel shows averaged lower lipprotrusion traces for /kuktlik/ and thildluk/ as producedby English subject NM. Lower panel shows originalaveraged protrusion trace for /kuktluk/ (solid line) withsuperimposed trace constructed by adding averagedtraces for /kuktlik/ and /kiktluk/ and subtractingaveraged trace for .a.ciktlild (dashed line). The verticalline is the V1 offset. Vertical and horizontal scales are asin Figure 9.

4`)3

Page 25: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

16

DISCUSSIONThe results of the Subtraction Test constitute

relatively strong evidence for the generalizationthat English speakers produce troughs for wordssuch as /kuktluk/, and for the notion that gesturesare independent entities whose trajectoriescombine when overlapped in time. The subtractiontest results also suggest that additivity is at leasta reasonable approximation of the way thatgestures combine for these articulators and thesesegments.

The results of the Addition Test are less clear.While the predicted and actual trajectories wereclose for some subjects, for other subjects theywere qualitatively different. While the resultswere slightly better for English subjects than forTurkish subjects, the distinction between threesubjects out of four (for English) vs. two subjectsout of four, or even one out of four (for Turkish), ishardly great enough to warrant concluding thetwo languages are different. It is also not clearhow to interpret a lack of correspondence betweenconstructed and original traces; for instance, theassumption of similar conditions of speech rate,stress and gesture phasing between averaged i -u,u-i, u-u and i -i words may not be accurate. The factthat in Turkish i-u and u-i words are non-harmonicis also relevant. It is possible, for instance, that /u/and /V vowels in Turkish words are alwaysproduced with independently organized gestures,but that these gestures are different in harmonicand non-harmonic words. A fuller discussion ofthese issues can be found in Boyce (1988).

General DiscussionOverall, the results of this study suggest there is

something very different in the way English andTurkish speakers organize articulation, at least inthe way they use lip protrusion for roundedsegments. The simplest index of this difference isthe plateau pattern of protrusion evinced by theTurkish speakers, which contrasts with theEnglish patterns found here, and with '.e troughpatterns reported in the literature to date. Theresults for English subjects, and in particular theresults of the subtraction test for subjects MB andNM, confirm that the underlying articulatorystrategy for u-u words in English follows a troughpattern.

With regard to the comoo!ting coproduction andlook-ahead types of models, interpretation of theseresults is both straightforward and complex. Thestraightforward interpretation itz as follows. Sincethe coproduction model predicts a trough patternin u-u words, and English shows a trough pattern,

then English speakers employ a coproductionarticulatory strategy. Since the look-ahead moealpredicts a plateau pattern in u-u words, andTurkish shows a plateau pattern, then Turkishspeakers employ a look-ahead strategy. Thus,English and Turkish have different articulatorystrategies.

This interpretation gains strength from the factthat, for each model, explaining the patterns ofboth English and Turkish requires an additionalmechanism. To explain the English trough patternthe look-ahead model must posit additional effectssuch as syllable-boundary marking, diphthon-gization, or consonant-specific unrounding in arounded context. Similarly, to explain the Turkishplateau pattern the coproduction model must positan unknown effect that causes 1...0 and u-i vowelgestures to differ from those in u-u words, or anunknown principle of gesture combination, or aloosening of the notion that gestures may beassociated with only one segment. While any ofthese posited effects may ultimately prove to bevalid, their status at this stage of investigationappears to be weak.

The complexity of this interpretation lies in theconclusion that different languages may employdifferent articulatory strategies. In some sense,this is to be exile' ted, since the combination ofphonology, lexicon and syntax in differentlanguages may impose entirely differentchallenges to articulatory efficiency. In fact, thehypothesis behind this comparison of Turkish andEnglish was the notion that, in contrast toEnglish, Turkish provides ideal conditions forarticulatory look-ahead. At the same time, humanbeings presumably come to the task of languageacquisition with the same tools and talents. Thefinding that current models of coarticulation areinsufficient to account for language diversityindicates that we have not yet penetrated to theuniversal level in the way we think about speechproduction. Further research, and in particularmore cross-linguistic research, is needed in orderto close this gap.

REFERENCESBeckman, M., & Shoji, A. (1984). Spectral and perceptual evidence

for CV coarticulation in devoiced /di and /syu/ in Japanese.Phonetica,41, 61-71.

Bell-Berti, F., & Harris, K. S. (1974). More on the motororganization of speech gestures. Haskins Laboratories StatusReport on Speech Research, SR- 37/38, 73-7/.

Bell -Bend, F., & Harris, K. S. (1979). Antidpatory coarticulation:some implications from a study of lip rounding. Journal of theAcoustical Society of America, 65, 1268-1270.

Bell-Berti, F., & Harris, K. S. (1981). A temporal model of speechproduction. Phonetics, 38, 9-20.

24

Page 26: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Coarticulatory Organization for Lip-rounding in Turkish and English 17

Be F., & HMIs, K. S. (1982). Temporal patterns ofcoarticulation: Lip rounding. Journal of the Aantstical Society ofAnoraks, 71,449-454.

Benguad, A.-P., it Cowan, H. (1974). Coarticulation of upper lipprotrubon in French. Phonetics,* 4145.

Blair, C, k Smith, A. (1986). EMC recording in human lipmuscles: Can single aweda be isolated? Journal of Speech andHawing Remade, 29,256466.

Boyce, S. (1978). Acme or accident?: The acoustical condom of word-level prominnsce in Turkish. Unpublished Batchelor of Artsthesis Harvard University (Linguistics Department).

Boyce, S. B. (1988). The influence of phonologiad structure onartiadatory organization in Turkish and in Eighth: Voice: harmonyand coarticulation. Unpublished doctoral dissertation, YaleUniversity.

Brown, G. (1981). Consonant rounding in British English: Thestatus of phenolic descriptions as biaxial data. In R. E. Asher14 E. J. A. Henderson (Eds.), Towards a history of phonetics.Edinburgh: Edinburgh University Press.

Clements, C. N., is Sem, E. (1982). Vowel and consonantdisharmony ht Turkish. In H. van der Hulot & N. Smith (Eds.),The structureof phonological representations. Forts, Dordrecht.

Cohn, A. C. (1988). Phonetic evidence for configurationconstraints. Presented at the 19th Meeting of the New EnglandLinguistic Society (NELS), November, 1988.

Daniloff, R. C., & Moll, K. L. (1968). Coarticulation of lip-rounding. Journal of Speech Hearing Research,11,707-72l.

Engstrand, 0. (1981). Acoustic constraints of invariant inputrepresentation? An experimental study of selected articulatorymovements and targets. Reports from Uppsala University,Department of Ungtdatics, 7, 6794.

Fowler, -. (1980). Coarticulation and theories of extrinsic timingcontrol. Joureal of Phonetics, 8,113 -133.

Cay, T. J. (1978). Artiadatory units: Segments or syllables? In A.Bell it J. B. Hooper (Eds.), Syllables and segments. AmsterdantNorth-Holiond Prem.

Gar, C E., BdI-Berti, F., Se Harris, K. 1 0989). Determining themien: of coarticubtion: Effects of experimental design. Joinedof the Acotaticd Society ofAmerica, 86,2443-2445.

Harris, K. S., it Bell-Berti, F. 0984). On consonants and syllableboundaries. In I.. Raphael, C Raphael, is M. Valdovinoe (Eds.),Language and cognition. New York: Plenum Publishing.

Henke, W. L (1966). Dynamic artiaskaory maid of speech productionusing computer simulation. Unpublished doctoral dissertation,Massachusetb Institute of Technology.

Keating, P. 0985). CV phonology, experimental phonetics andcoarticulation. UCLA Wanking Papers in Phonetics, 62,1-13.

Keating, P. 0988). Underspedfication in phonetics. Phonology, 5,275-292.

Kozhevnikov, V. A., & Chistovich, L. A. (1966). Rech,artikulatsiya, i vospriyatiye, (Speech: Articulation andperception) (originally published 1965). Washington, DC JointPublications Research Service.

Ladefoged, P. (1975). A course in phonetics. New York: HarcourtBrace Jovanovich.

Leidner, D. R (1973). An electromyographic and acoustic study ofAmerican English liquids. Unpublished doctoral dissertation,University of Connecticut, Storrs.

Lewis, G. L. (1967). Turkish grammar. Oxford Qarendon Press.Liberman, A. M., it Studdert-Kennedy, M. (1977). Phonetic

perception. In iL Held, H. Leibowitz, & H.-L. Teuber (Eds.),Handbook of sensory physiology, Vol. VIII: Perception. Heidelberg:Springer-Verlag.

1.61qvist, A. (1989). Speech as audible gestures. Paper presented atthe NATO Advanced Study Institute Conference on SpeechProduction and Speech Modeling, Bones, France.

Lubka, J. (1981). Temporal aspects of speech production:Anticipatory labial coarticulation. Phonetic., 341,

Lubker, J., is Cay, T. (1982). Anticipatory labial coarticulation:Experimental, biological, and linguistic variables. Journal of theAcoustical Society of Muria, 71, 437447.

Megan, H. (1984). Vowel-to vowel coarticulation in English andJapanese. journal of the Acoustics Society of America, 75, S41.

Manuel, S. Y. (1990). The role of outrt constraints in vowel-to -vowel coarticulation. Submitted to Journal of the AcousticalSociety of America,

Martin, J. C., & Bunnell, H. T. (1982). Perception of anfidpaterycoarticulation effects in vowel-stop consonant-vowelsequences. krurnal of Experimental Psychology: Maw Perceptionand Perfonnence 8, 473-488.

McAllister, J. 0978). Temporal asymmetry in labial coarticulation.Papers from the Institute of Linguistics, 3S, 1-29, (University ofStockholm, Stockholm).

Chile, J. J. (1981). The listener as a source of sound change. In CManic, R. Hendrick it M. F. Miller (Eds.). Papers from thepersuasion on language and behavior. Chicago: ChicagoLinguistic Sodety.

Oh man, S. E. C. (1966). Coarticulation in VCV utterances:spectrographic measurements. Journalof the Acoustics' Society ofAmerica, 39, 151-168.

Perk.% J. S. (1986). Coarticulation strategies: preliminaryimplications of a detailed analysis of lower lip protrusionmovements. Speech Communication, 5, 47-68.

Reams, D. (1985). Coarticulatory patterns and degrees ofmarticulatory resistance in Catalan CV sap:own& Language andSpeech, 28, 97-114.

Saltzman, E. L, & K. G. MunhalL 0989). A dynamical approach togestural patterning in speech production. Ecological Psychology,1, 333-382.

Saltzman, E., Rubin. P. E., Goldstein, L., it Browman, C P. 0987).Task-dynamic modeling of inter-articulator coordination.Journal of the Acoustical Society of Armoric*, 82, 515.

Sussman, H. M., it Westbury, J. R. (1981). The effects ofantagonistic gestures on temporal and amplitude parameters ofanticipatory labial coarticulation. Journal of Speech and HearingResearch, 46,16-24.

Tuller, B., Kelso, J. A. S., & Harris, K. S. (1982). htterrticulatorphasing as an index of temporal regularity in speech. Journal ofExperimental Psychology: Human Perception and Performance 8,460-472.

FOOTNOTES'To appear in Journal o f the Acoustical Society of America.t Speech Communication Croup, Research Laboratory of

Electronic, Massachusetts Institute of Technology.1 The models tend to differ in the level at which competibihty isassessed. In Henke's program, the limitation was explicitlydefined on an articulatory basis. Other investigators (Cohn,1988; Keathtg, 1988) have postulated that coarticulation spreadsby reference to feature specification at the phonological level.

2 Rotational had movement could theoretically affect the resultsin two ways. First, the LED's might have been moved into amore peripheral arcs of the focus field where tracking is lessaccurate. Subjects were constantly monitored against thispossibility during the course of the experiment, and videotapedexperiments were checked post-hoc. Second, rotation of thehad changes the relationship between the vertical andhorizontal axes of the LED stacking system (the X and Ycoordinates) and the subjects' sagittal midline. Thus, less ormore of the subjects' anterior-posterior movement relative tothe midline may be detected. Note, however, that in all cases

4r5

Page 27: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

18 Boyce

we a subjects' amplitude of movement changed over thecourse of the experiment this change was mirrored in thecorresponding EMC signal, suggesting that baseline changewas not a significant factor.

3 Interestingly, for subjects IB and CK, the EMC pattern for u-uwords metaled that for u-v 11w pattern for 1-u wordsshowed a strong peak amodated with V2. For subject EC, onthe other hand, the pattern for /kuktluk/ resembled that for/IdIctluk/, while the patterns for shorter words /kukuk /,/Induk/, etc. resembled those for /lcukik/, /kulik/, etc. EMCtraces for these words are reported in Boyce (1968).

4 To some extent this difference between COS and 001 signals isa consequence of averaging, as token traces for the two signalsshowed differing proportions of double- and triple-peakedpatterns.

5 Perusal of the traces for /kink /, /1d1114 and /klldk/ suggestthat NM shown some lower lip protrusion for each intervocalicconsonant. For MB the lower lip trace shows markedprotrusion for /t/ and some protrusion for /1/, while theupper lip shows protrusion only for /1/. Each of theseprotrusion peaks matches the consonant-interval peak of theccerespcedingu-u word.

6Other comparisons of JX, JY, LLX and LLY Mends, as well asobservations of the videotapr, suggest that these speakers useforward movement of the jaw (both rotational andtranslational) as well as lip movement to product protrusionduring rounded vowels. This topic is discussed further inBoyce (1908).

7 One intractable problem with this procedure is that /1/ and/u/ vowels also should have characteristic patterns. Thus, ifretraction for /i/ vowels is presentit will be subtracted fromprotrusion for /u/ vowels at the same time the consonantprotrusion is subtracted. In these data, the relative magnitudeof protrusion dwarfed that of retraction. Thus, it was assumedthat the effects of subtracting the one outweighed the effects ofthe other.As in the first teat, some residue of possible /i/ vowel-smodated movement remains in these constructed traces.

9 Note that the theory of independent gestures does not requirethat all gestures have identical amplitude or be produced with

identical force. Such a requirement would leave no room forthe effect of fatigue or for prosodic variables such as stress andsyllable position. It is a common observation, for instance, thatEMC or movement signals for the same word may show lessamplitude at later stages during the same experiment. In thisstudy, the fact that t-u, wu, u-1 and 1-1 utterances were blockedseparately may have caused some differences in overallamplitude among them.

APPENDIXTurkish has eight vowels /1 1 a e o 0 u y/ and

thus (like Swedish but unlike English) has vowelswhich contrast only in rounding. The consonants

/k/ and /1/ are non-labial and phonemicallyunrounded in both languages (Ladefoged, 1975;Lewis, 1967). English and Turkish have somewhatdifferent patterns of allophonig variation for /k/and N. In Turkish /k/and /1/ tend to be front orback accordii.g to the front/backness of the vowelof the same syllable (Clements & Sezer, 1982). Incontrast, for most English dialects syllable-initial/I/ is front and syllable-final /1/ is back, i.e.velarized (Keating, 1985; Ladefoged, 1975), while/k/ varies primarily in syllable-initial position,becoming front before front vowels and backbefore back vowels. (Although it is sometimesreferred to as 'consonant harmony,* the Turkishrule for /1/ and /k./ is distinct from that forfront/back harmony in vowels.) Neither /I/ nor /k/participate in roundness harmony. The sequence/kW is rare in both languages, bur exists, c.f.English tactless and Turkish /paktlar/ 'pacts.'Neither language allows the initial cluster AY;therefore, /kuktluk/ would have the syllablestructure /kukt-luk/ in both languages.

Page 28: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Haskins Laboratorka Status Report on Speech Research1989, SR 99 / 100, 19-37

Long Range Coarticulatory Effects for Tongue DorsumContact in VCVCV Sequences*

Daniel kecasenst

The goal of this paper is t, gather accurate information about the temporal and spatialproperties of tongue dorsum movement in running speech. Electropalatographic andacoustical data were collected to measure lingual coarticulation over time. Coarticulatoryeffects were measured along VaerV utterances for articulations differing in the degree oftongue dorsum contact, namely, for vowels (i] vs. (a] and for consonants (j] vs. (t]; thecontextual phonemes were all possible combinations of those same consonants and vowels.Results show contrasting mechanisms for anticipatory and carryover coarticulation.Anticipatory effects appear to be more tightly controlled than carryover effects presumablybecause of phonemic preplannim; accordingly, gestural antagonism in the contextualphonemes affects the two coarticulatory types differently. The relevance of these data withrespect to theories dcoarticulation and speech production modeling is disc-sst

L INTRODUCTION

A large body of experimental evidence in thephonetics literature shows that the articulatorygestures for successive phonemic units arecoarticulated in running speech and thus overlapin time. Many studies of coarticulation aim atreaching some understanding about themechanisms of phonemic realization in speechproduction. A relevant goal of this research is toseparate articulatory events resulting from motorprogramming strategies, from others related tothe mechanical and physical properties of thespeech production system. This paper approachesthe nature of these two components through ananalysis of tongue dorsum coarticulation overtime, under the theoretical assumption that thephonemic gestures are programmed to exhibitcertain temporal and spatial patterns, and thatthose patterns prevail to a large extent across

I wish to thank C. A. Fowler, K. S. Harris and I. G. MattinglyOr their valuable suggestions on particular aspects of thisresearch. I would also like to thank R. Arens Krakow, E.Bateson, V. Gunnar, and R. McGowan for technical help. Thisresearch was supported by NINCDS Grant NS-13817 toHaskins Laboratories, a postdoctoral fellowship from theSpain-U.8. Committee, research grant PB87-0774402-01 fromthe Spanish Government (DGICYT), and a fellowship from theCatalan Government MINT Commission).

changes in phonemic cont ct, speech rate andspeaker (Harris, 1984; MacNeilage & DeClerk,1969).

The research to be reported here is relevant to anumber of significant issues.

A. Coarticulation at a distanceOne of the issues of interest in coarticulation

studies is the temporal domain of gestural activityfor a given phonemic unit. Early researchsuggested that jaw (Gay, 1977; Sussman et al.,1973) and lingual activity (see references insection I.C) did not extend more than one or twosegments beyond the target phoneme: more recentacoustic evidence shows however that suchcoarticulatory effects may last for three or fourphonemes. Thus, long range temporal effects havebeen found for American English in PilolV) (V1-to-V3 effects; Huffman, 1986) and in [VbobV] (V3-to-V1 effects; Magen, 1989) sequences. The presentpaper intends to replicate these findings throughan analysis of coarticulation at a distance inVCVCV -.-tterances. Analogously to Huffman's andMagen studies, V2 was kept constant an [o] tofacilitate long range effects over time ce theschwa is highly sensitive to coarticulation fromthe adjacent segments (Recasens, 1985). Anothergrill of this research is to find out whether longrange coarticulatory effects occur in otherlanguages besides English.

19 n P..14,

Page 29: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

B. V-and C-deper dent coarticulationEvidence for transconsonantal V-to-V

coarticulation, and for greater V-to-C effects thanC-to-V effects (e.g., MacNeilage & DeClerk, 1969),suggests that consonantal gestures unspecified fortongue-dorsum activity (e.g., labials anddentoalveolars) are produced with reference to acontinuous V-to-V cycle (Fowler, 1984; ohman,1966). The implication of this view is that thedomain of consonant related coarticulation is moreconstrained than the domain of vowel relatedcoarticulation. V-to-V effects ought to be largerthan C-to-C effects since the phonemic string isorganized according to underlying gestures of adiphthongal nature.

To test this theory I will study coarticulatoryeffects from vowels and consonants differing indegree of linguopalatal contact along VC[o]CVsequences. Thus, C-dependent effects will beanalyzed for the palatoalveolar consonant (f] vs.the apicodental or apicoalveolar consonant [t], andV-dependent effects for the palatal vowel [i] vs.the non-palatal vowel [a]. If the hypothesis iscorrect, C-dependent effects ought not to extendbeyond V2=[o]. Notice however that since theschwa has been denied a vocal tract target of itsown (Catford, 1977) the availability of C-to-Ceffects across [o] would not invalidate a strongversion of the theory. However, it would be criticalfor the theory if the temporal domain of the C-dependent effects exceeded that of the V-dependent effects.

C. Gestural antagonismTheories of coarticulation agree that gestural

activity corresponding to phonemic units does nottake place across antagonistic gestures (seesection I.D). The problem is that no ewerformulation of what is meant by gesturalantagonism is available in most cases (Bell-Berti& Harris, 1981; Fowler, 1984). It is proposed inthis paper that the degree of tcngue dorsumcoarticulation is inversely reltted to theinvolvement of the dorsum of the ;.ongue inmaking a closure or a constriction. Data forconsonants and vowels in the literature offer somesupport for this hypothesis.

Anticipatory V-to-V effects for tongue dorsumactivity in VCV utterances have been found byButcher end Weiher (1976), and Alfonso and Baer(1982), but much less so or not at all by Carneyand Moll (1971), Gay (1977), Parush et al. (1983),and Farnetani et al. (1985). The explanationunderlying those contrasting findings lies partlyin the artic ilatory characteristics of the

intervocalic consonant. Th AI, V-to-V effects occurmainly for consonants for which the dorsum of thetongue is not involved in the making of a closureor a constriction ap]: Alfonso and Baer (1982 ); it]:Butcher and Weiher (1976)). Velars, on the otherhand, were found to block V-to-V affects to a muchlarger extent than labials and dentoalveolars insome of the previous studies. I have shown in thisrespect that the degree of transconsonantal V-to-Vcoarticulation is related inversely andmonotonically to the degree of tongue-dorsumcontact, for more palatal-like vs. more alveolar-like consonants (Recasens, 1984).

The articulatory characteristics of vowels alsoaffect the extent of V-to-V coarticulation overtime Like palatal consonants, palatal vowels aremore resistant to tongue dorsum coarticulationthan vowels showing no constriction at the palatalplace of articulation. Thus, smaller V-to-V effectson [i] than on [a] have been reported by Gay(1977), Butcher and Weiher (1976), and Recasens(1984).

This study will look into the extent to wnichcoarticulation is blocked during the production ofconsonants and vowels involving different degreesof palatal contact, namely, (I] and [t], and [1] and[a]. The hypothesis that coarticulation is inverselyrelated to the degree of tongue dorsum contact forthe contextual gestures will be tested acrossadjacent and distant phonetic segments; for thatpurpose, all possible combinations of consonants[f] and [t], and vowels [1] and [a] in VC[o]CVsequences will be submitted to experimentalanalysis.

D. Anticipatory vs. carryovercoarticulation

Another issue of interest in the present study isthe nature of the anticipatory and the carryovereffects.

The sequencing of these two coarticulation typeswith respect to the target phoneme suggests thatthe anticipatory effects ought to reflect phonemicpreplanning, and that the carryover effects oughtto be mainly determined by the articulatoryproperties the target and contextual phonemes.However, as shown in section LC, the anticipationof tongue dorsum activity is not independent ofthe production characteristics of the precedingphonemes. Therefore, possible candidates forpreplanning mechanisms would be thoseregularities in onset time of articulatory activitythat can be s:gown to depend minimally on thearticulatory properties of the preceding phonemicstring.

28

Page 30: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Long Range Coartiadatory Effects 21

Z. aories exist about what those regularitiesmay be. They have been devised, however, toaccount mostly for velar and labial activity, andnot so much for tongue movement (Sussman &Westbury, 1981). Moreover, they disagree as towhether anticipatory effects may affect anunlimited number of non antagonistic gestures(Henke, 1966), or whether they are locked in timeto the target gesture provided that no articulatoryconflict is involved (Bell-Berti & Harris, 1981).Further refinements need to be carried out ontheories of coarticulation to accommodatedifferences in coarticulatory activity for differentarticulators as well as differences in gesturalantagonism associated with the contextualphonemes.

Some findings reported in coarticulation studiesare relevarlt with respect to the contrastingnature of anticipatory vs. carryover effects fortongue dorsum activity: (a) anticipatory eatects areessentially temporal and their onset shows littlevariability, while carryover effects are essentiallyspatial and more variable (Gay, 1977; Parush etal., 1983); (b) carryover e 'acts may be larger thananticipatory effects (Farnetani et al., 1985;Recasens, 1984) but also smaller (Butcher &Weiher, 1976). Acoustic evidence for largercarryover vs. anticipatory effects (Fowler, 1981;Huffman, 1986) and vice versa (Magen, 1989) isalso available.

One can argue that while the findings reportedin (a) result from the preplanned vs. mechanicalnature of anticipatory WI. carryover coarticulation,.respectively, those reported in (b) are dependenton differences in speech rate, speaker andphonemic context. Thus, higher speech rates arelikely to bring about an increase in the amount ofanticipatory coarticulation while reducing thedegree of antagonism from the preceding gestures;on the other hand, slower speech ratespresumably cause the mechanical constraints forthe preceding gestures to overcome thoseanticipatory effects associated with thepreplanning of the target phoneme. Therefore onemight plausibly argue for a progressive increase ofanticipatory vs. carryover effects as speech rateincreases, and of carryover vs. anticipatory effectsas speech rate decreases. Similarly, the relativesalience of the anticipatory vs. carryover effectsmay depend on the degree of articulatoryconstraint associated with the gestures precedingand following EA target phoneme: anticipatoryeffects are likely to prevail upon carryover effectswhen the phonetic segments following the targetphoneme involve higher gestural requirements

than those preceding it; on the other hand,carryover effscts should be larger thananticipatory effects if the phonetic segmentspreceding the phonemic target are moreconstrained than those following

Within this framework, the present study isconcerned with the nature and the relativesalience of the anticipatory vs. carryover modes ofcoarticulation across segments showing differentdegrees of coarticulatory resistance.

II. METHOD

A. Articulatory analysisElectronalatography (EPG) was used to analyze

tongue dorsum contact over time.Electropalatographic data were collected for allpossible VC[a]CV combinations with C =[J], (t] andV=[i], [a], and stress on the last syllable; allsequences were embedded in a *p p" carrierenvironment.

Subjects read the list of sequences listed inTable 1. Two assumptions underlie the ordering ofthose sequences. In the first place, the degree ofcontextual interference with respect tocoarticulatory effects ought to decrease as wemove away from the target phoneme. Thus, e.g.,Vi- dependent effects ought to be more heavilyinfluenced by the articulatory characteristics ofC1 than by those of the phonetic segments placedat the other side of [o] (i.e., C2 and V3). Secondly,phonetic segments involving more palatal contact(i.e., [J] and (i]) ought to conflict withcoarticulatory effects to a larger extent than thawinvolving a lesser degree of palatal contact (i.e., [t.1and (a]).

Table 1. List of sequences used in the experiment.Stress was placed on the last syllable.

1. pijafip 5. pif pup 9. pitafip 13. pitatip2. pajojip 6. psjatip 10. patorip 14. pawl*3. pijajap 7. pif pimp 11. pitafsp 15. pitatsp4. papjap 8. pefatap 12. pawl" 16. pstatap

A composite account of the two assumptionsexplains the particular ordering of the utterancesin Table 1. Thus, the utterances in the table areordered for decreasing degrees of resistance tocarryover coarticulation associated with V1. In thefirst place, the offset of Vi- dependent effects for [i]vs. (a] is expected to occur earlier for sequences 1through 8 than for sequences 9 through 16 sinceC1=[J] in the first set of utterances and C1 =(t] in

ti n

fir

Page 31: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

22

the second. Within each set of sequences with adifferent Cl, Vi- dependent carryover effects oughtto last for a shorter time when C2[J] than whenC2=[t] (sequences 1 through 4 vs. 5 through 8, and9 through 12 vs. 13 through 16). Finally, withineach set of sequences showing a different Cl and adifferent C2, Vl-dependent carryover effects oughtto last less long when V3 =[i] than when V3=[a](sequences 1 and 2 vs. 3 and 4, 5 and 6 vs. 7 and 8,and so on).

A Catalan speaker from the Barcelona region(Re), and two American English speakers from24,sw York City (Ra, Ba), repeated all utterancesten times with the artificial palate in place.Simultaneous recordings wee made of the EPGand the acoustic signals. The American Englishspeakers were asked to avoie. ilapping of phonemic/t/ and to make an alveolar stop instead; phoneticperception and inspection of the EPG datarevealed that the consonant was always M.

The electropalatographic system (RionElectropalatograph model DP-01) has beendescribed elsewhere (Recasens, 1984; Shibata etal., 1978). It is equipped with 63 electrodesarranged in five semicircular rows (see Figure 1)and allows displaying one pattern of contact every15.6 ms. As shown in the figure, the electrodes canbe grouped in four articulatory regions (i.e.,alveolar, prepalatal, mediopalatal andpostpalatal) for data analysis. The figure alsoshows that some electrodes are located along amedian line; since electrodes on this line belongneither to the right nor to the left side of thepalate, they were assigned to both sides whennecessary. One can see that the number ofelectrodes decreases as rows become more central;thus, the outermost row has 8.5 electrodes on eachside and the innermost row has 3.5.

I:""rr/ 1%-.-n-'--:"::--i1 2 3 4 5

Rowsof

electrodes

Figure 1. Electroplate.

ALVEOLAR RE 310N

PREPALATAL REGION

MEDIOPALATAL REGION

POSTPALATAL REGION

B. Acoustical analysisThe acoustical data were digitized at a sampling

rate of 10kHz, after pre-emphasis and low-passfiltering. An LPC program included in the ILS(Interactive Laboratory System) package wasavailable for spectral analysis. F2 data werecollected at the measurement points of interestand averaged across repetitions. F2measurements were preferred to othermeasurements (e.g., F3 and/or Fl) since, as shownin the literature, there is a good correlationbetween F2 and tongue placement for vowels (e.g.,Alfonso & Baer, 1982; Fant, 1960).

C. Measurement points in timeThe points in time selected for analysis are

shown in Table 2 for the EPG and the F2 data.

Table 2. Measurement points in time for the EPG data(above) and the F2 data (below).

I I I I I I I I I I I

1 2 3 4 5 6 7 11 9

1. V1 mldp. 4. V2 one. 7. (12 mldp.

2. V1 off. 5. V2 mldp. 11. V3 am3. C1 mIdp. 11. V2 offs. II. V3 WO.

SLAM

1 I I I

1 2 3

1. V1 mldp.

2. V1 mIdpiofts.3. V2 mldp.

I I I

4 5

4. VS ens./mldp.

5. V3 mldp.

Measurement points for the EPG data werelabeled on the acoustic waveform except for the [t]midpoint (see Figure 2). This was due to the factthat, except for the [t] closure, the articulatoryevents of interest could not be locatedsatisfactorily on the EPG record; thus, forexample, it was not possible to select the framecorresponding to the period of maximumpostalveolar constriction for [J] (e.g., especiallywhen adjacent to the vowel [i] as exemplified inFigure 2). Vowel onsets and offsets were labeled atonsets and offsets of voicing; moreover, those lowamplitude pitch pulses occurring after the onset oflingual closure for ft], as determined visually onthe acoustic waveform, were excluded from the

30

Page 32: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Long Raw Coarticulaiory Effects

vocalic period in the measurement procedure.Vowel midpoints were labeled halfway betweenthe vowel onsets and offsets. The En midpoint waslocated at the midpoint between the voicing offsetfor the preceding vowel and the voicing onset forthe following vowel. Acoustic waveforms for allutterances were labeled at the points in timeindicated in Table 2 and the EPG framescorresponding to those acoustic labels were usedfor data analysis.

The consonantal midpoint for [t] was establishedat the closure midpoint. Since the closure periodfor this consonant is easy to determine on theEPG record it was decided to use an articulatoryrather than an acoustic criterion in this case.Thus, the [t] midpoint was the only frame or themedial frame of several successive frames showingall electrodes lighted up on row 1; wheh this is thecase, maximal contact at the front of the alveolarregion is achieved.

1 2 3 4

23

Acoustical analysis (F2 measurements) wasperformed at different points along the vowels, asshown in Table 2. The main purpose of theacoustical analysis was to supplement the EPGdata and, in particular, to obtain moreinformation about coarticulatory effects at themidpoint of (a) since linguopalatal contact for thisvowel at this particular point in time waspractically absent for two of the three speakers(see Figure 5). The location of the vowel'smidpoint is the same as that used in the EPGanalysis. A single point along V2=[o] was chosenfor analysis (i.e., V2 midpoint). The V1 offset andthe V3 onset labels were not placed at the voicingendpoints since the frequency of the F2 peakshows a good amount of variability at these pointsin time; instead, F2 measurements were taken atequidistant points between the V1 midpoint andthe V1 offset (i.e., V1 midpioffs.), and between theV3 onset and the V3 midpoint (i.e., V3 onsimidp.).

5 6 7 8 9

rZ % r.::::%%.% 8 1., .. I -

WA IA 1 In ili da ii A it

Figure 2. Exemplification of the line-up procedure between the acoustic labels and the EPG frames in the case of theutterance [pajojipl (speaker Ra). See Table 2 for measurement points in time 1 through 9.

Page 33: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

24

D. Measurement of palatal contact

To achieve an accurate estimate of degree oftongue dorsum contact at the palatal regions, themeasure of contact was the number of Noneelectrodes in the innermost three or four rows ofthe artificial palate. The main reason for thischoice was that differences between the degree ofpalatal contact for (i] vs. [a] and for [r] vs. [t] aremore salient at the center of the prepalate,mediopalate and postpalate, than at the sides ofthese articulatory regions (see section III.A). Thiscriterion ensures that coarticulatory effectscorrespond to the most distinctive measure ofcontrast in degree of dorsopalatal contact, and aremaximally preserved during the production of thesurrounding phonemes.

EPG data was gathered for rows 3,4,5 forspeakers Re and Ra, and for rows 2,3,4,5 forspeaker Ba. This speaker-dependent contrast wasdue to the fact that linguopalatal contact wassystematically more peripheral for speaker Bathan for speakers Re and Ra (see section III.A).

III. RESULTS

A. Linguopalatal contact andcoarticulatory resistance forconsonants and vowels

1. ConsonantsFigure 3 shows patterns of linguopalatal contact

averaged across repetitions at the midpoint ofC2=[1] and C2=[t] for all speakers. Linguopalatalconfigurations in the figure correspond to thestressed CV syllables in sequences 1, 4, 13 and 16of Table 1.

In Figure 3, tongue contact takes place betweenthe contour lines and the sides of the palate; thecentral area was left untouched by the tongue.Contour lines connect averages of linguopalistalcontact between rows. The electrodes on or behindthe contour line along a given row have been "on"100% of contacts across repetitions. The degree offronting for the contour line along the spacebetween two adjacent electrodes on a given row isproportional to the contact average for thefrontmost of the two electrodes; therefore the linelies closer to the frontmost electrode as the contactaverage for that electrode increases. Thus, in thecase of the contour line for [Ii] (speaker Ra),electrodes 1 through 6 on row 1 of the left side ofthe palate were lit up in 100% of occurrences (i.e.,

in all repetitions) and electrode 7 in 30% ofoccurrences (i.e., in three out of ten repetitions).

The patterns of contact for Er] show that theconsonant is produced with a central groove alongthe postalveolar and palatal regions while there isa wide contact area at both sides of the palate. Asfor [t], it is produced with a complete closure atthe front of the alveolar region (all speakers) andat the postalveo!..r region (speaker Ra); overall,the degree of lateral contact is less than for W.

Figure 4 is another display of the same EPGframes row by row. The bars in the figure standfor the overall number of on electrodes on eachrow. As to the issue of concern here, namely, thedegree of tongue dorsum contact at the center ofthe palatal regions (on rows 3,4 and 5; see Figure1), a general trend is observed for [r] to show morecontact than [t]. However, while this is true before[a], it is not so much the case before (i]. Thus, theclaim that fj] shows more dorsal contact than [t] isalways true in the adjacency of low vowels but notin the adjacency of high vowels. Moreover, as alsoshown in Figure 3, speaker Ba shows less centralcontact at the palatal regions than the other twospeakers; thus, no contact at all was observed for[t] on rows 3, 4 awl 5 . For that reason, the EPGdata for this speaker were obtained by computingthe number of on electrodes on rows 2, 3, 4 and 5(see section 1I.D) at the expense of including a fewelectrodes located at the postalveolar region onrow 2.

The EPG data on Figure 3 confirms thehypothesis that [r] is less sensitive than [t] tocoarticulatory effects in tongue dorsum activitybecause it involves more palatal contact. Thus,while the degree of contact at the palatal regionsis highly similar for fr a] and En] (i.e., little or notongue dorsum lowering occurs for U] before [a]),[ti] shows more tongue dorsum contact towardsthe median line than (till (i.e., during the. roduction of [t], the tongue dorsum allows somevertical displacement aq a function of the adjacentvowel). This trend is highly consistent for all threespeakers. It should be pointed out that manner ofarticulation requirements may contribute to the. .sence of coarticulatory effects in tongue dorsumactivity for [I]; as shown in the literature(M:Cutcheon et al., 1980; Wolf et al., 1976), theformation of the medial groove for fricativesinvolves a high degree of articulatory precision.

In summary, adjacent vowels cause morevariability in tongue dorsum contact for (I] thanfor (t] in accordance with the articulatorycharacteristics of these consonants.

32

Page 34: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Long Range Coartieuknory Effects

.

.

Speaker Re

..

.

'14 1Is : . ... t. I.

. t . . 1 .. 1% \ . . I .

. ,.1 1 . . . .

.

Di I

[ta]

.

. 4.

ip ]

...._ Era]

.

.

. ..... 'I*

. 1

. . I. . . I. / I

.I / L--, .

Speaker Rg

s

Speaker Bg [ti I

Dal

Figure .3. Linguopalatal patterns at the midpoint of C2=III and C2411 as a function of following DI and [al for speakersRe, Ra and Ba.

4 '

Page 35: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

11;--

4v,,,C21

Filf72W

/ZM

UN

E7

JAA

2

Page 36: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Long Range Coarticuiatory Effects 27

2. VowelsFigure 5 shows patterns of linguopalatal contact

averaged across repetitions at the midpoint ofV3=[i] and V3=[a] for the three speakers. As for [I]and [t] (Figure 3), only data for the stressed CVsyllables in the symmetrical sequences are given.

Linguopalatal patterns show a larger area ofcontact at the center of the palatal regions andmore tongue fronting for [i] than for [a]. Theabsence of linguopalatal contact for [a] in the casecf speakers Ba and Ra shows that the vowel islower in their New York City dialect than inCatalan.

a.

A I *

""

." .

lid*

N1- - 1

101911

MMILK.Ba

Bessisatis

. . .

. . .

Ital

Figure 5. Linguopalatal patterns at the midpoint of V3Taland V3-(al as a function of preceding C2=111 vs. (t] forspeakers Re, Ra and Ba.

Phonetic variability for [i] vs. [a] as a function ofthe immediately preceding consonant is analyzedin Figure 6. For each of the pairs of sequencesindicated in the figure, significant differences indegree of palatal contact and F2 frequency at thep < 0.011evel of significance are given. Effects areplotted at two points in time, namely, at V3 onsetand V3 midpoint (EPG data), and at V3onset/midpoint and V3 midpoint (F2 data).

Speaker ReSpeaker ReSpeaker Ba

Montag EPG data Usti&

itaVti

afalni

itafiti

ataf/ti

italha

afaffts

itaf/ta

ataf/ta

,

,

a

8 9 4 5Measurement poIrds

Figure 6. EPG data and F2 data on significantcoarticulatory effects upon V3411 and la) from precedingC24,11 vs. [t] at the p < 0.01 level of significance(speakers Re, Ra and Ba). Effects are given at the V3onset and the V3 midpoint (EPG data), and at the V3onset/midpoint and the V3 midpoint (F2 data). See Table2 for measurement points in time.

The figure shows much larger consonant-dependent coarticulatory effects at the onset (EPGdata) and the onset/midpoint (F2 data) of [a] thanat the midpoint of [a] and along U]. In the case of[a], these findings are due partly to the fact thatthe vowel onset is closer to the consonant than thevowel midpoint but also to the absence oflinguopalatal contact at the midpoint of the vowelfor speakers Ra and Ba. That there are no effectsalong V3=[i] for speakers Ra and Ba suggests thatthis vowel is more resistant to coarticulation than[a] because it is produced with much more palatalcontact.

In summary, [i] shows a larger degree of palatalcontact and is more resistant to coarticuiatoryeffects than [a]. Data on vertical displacement forthe tongue body during the production of [i] vs. [a](Perkell & Cohen, 1987) are consistent with thefindings reported here.

Page 37: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

28 Recasens

B. Long range coarticulatory effects

1. Vowel effects

la. Anticipatory coarticulationFigure 7 displays EPG and F2 data on

coarticulatory effects caused by V3 along thepreceding VCVC string for all speakers. Thosequences in the figure were ordered fordecreasing degrees of resistance to V3-dependentanticipatory effects, as explained in section ILA;thus, it was expected that these coarticulatoryeffects would decrease from (ilaji/al throughDitati/al as a function of the degree of articulatoryconstraint associated with the VCVC string.

The EPG data show that the extent ofanticipatory coarticulation is clearly dependent ontha degree of pabtal contact for C2 in the case ofspeakers Re and Ra. Overall, when C2=[1],anticipatory effects may be absent or may startduring C2; on the other hand, when C2 =[t], V3anticipation goes back to V2 =[a]. Therefore, twoessential modes of anticipatory coarticulation.

appear to take place for these two speakers: laterthan V2 =[a] when C2[j]; at V2=(a] when C2 =[t].These coarticulatory trends are very robust andshow that the anticipatory effects are conditionedby the degree of articulatory constraint involvedduring the production of the preceding phoneticsegment. No V3-dependent effects are availablebefore V` =(al. Speaker Ba shows no effects for anyof the sequences under analysis.

The F2 data are consistent to a large extentwith the EPG data. For speakers Re and Ra, V3-dependent anticipatory effects take place duringV21.-[a] when C241] and [t]; however, at least forspeaker Re, C2=[1] (but not C2 =[t]) blocks V3-to-V2 effects in some cases. V3-dependentcoarticulatory effects do not extend into Vl.Speaker Ba shows almost no V3-to-V2 effects.There is no one-to-one correspondence betweenthe EPG and the F2 data in this and in the nextfigures since F2 frequency is affected by changesin other articulatory dimensions besideslinguonslatal contact. .

V2 coLlrast

ii9TV9afafi/airafi/astadiaifati/a*lilai120/9atati/a

II IP 0 11I

ll M. III IN III IN MI MI laa IN a lIS 111

t1 1 IN U 1111 311 N

ibiliaapfitaitafliaamiliaifeli/aafati/aitaii/aatatila

11-111... 0-111.. s

tv a RI 131 leis Li ...--..111-11.

s , s.O .g MI MI NI III al

is n IN is rt 13 INto 39 to to El El

irafi/aargilsitafi/aasafi/aifati/aafati/aitati/amaim

ii Pi el RIHi rt rt riII El 13 In

a ,,

II II 21 IRn is gm ta 23 is rift ea rs rs

I1 2 3 4 5 1 2 3 4 5 II 7 11 9

Wimunment mints

Figure 7. EPG and F2 data on significant V3-dependent anticipatory effects along the entire VCIolC utterance at thep < 0.01 level of significance (speakers Re, Ra and Ba). See Table 2 for measurement points in time.

(1 6

Page 38: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Long Range Coartiadatory Effeds 29

The anticipation of the V3 gesture is not affectedby the degree of coarticulatory resistanceassociated with the segments located in the VCstring before V244 Thus, the V3-to-V2 effectsacross C2=(t] take place no later when C1 =[J] thanwhen Cln[t]. In fact, contrary to initialexpectations, the EPG data for speakers Re andRa show ar. earlier onset of anticipatorycoarticulation across C2=(t) when Vi=[i] thanwhen V1=[a].

In summary, for two of the speakers, the V3-to-V2 anticipatory effects are inversely dependent onthe degree of tongue dorsum contact for C2;therefore, coarticulation takes place across C2=(t]but not across C2:41]. Moreover, effects are

ELAM

independent of the degree of articulatoryconstraint for the phonetic segments preceding lo].No coarticulatory effects at a distance (i.e.,exceeding the V3-to-V2 domain) were found; thus,the EPG and the F2 data show neither V3-to-C1nor V3-to-V1 effects.

lb. Carryover coarticulationFigure 8 displays EPG and F2 data on

coarticulatory effects caused by V1 along thefollowing CVCV string for all speakers. Thesequences in the figure have been ordered fordecreasing degrees of resistance to carryovercoarticulation associated with V1, as explained insection ILA.

EZIAGla Vi spoimat

ihtfalii/aJaJai/algtiiialotai/atafiVatoraWoodi/atota

ir1

i/aratiWargoWadi/aptaVogli/atajaVinodVat=

N N PI PI N NMI la El 91 Ill El

6 Ell

mLI

N113 PI Nrto

Pi m N ra m m131 I ra or es rea la In El El is 0 El RI 63le to IP to PI MI L9 El

War aliWargoi/afatii/011.taWeanVolgaWoadWatata

, N PI PI N N,, - , An NI

m PI HI MI

N RI a II-ELIO , %, li N NN PI EN nN N N N N N N N NIS N m N N N N N N

2 3 4 5 1 2 3 4 5 6 7 8 9

Measurement DO 11111

Figure 8. EPG and F2 data on significant V1-dependent carryover effects along the entire aolCV utterance at thep < 0.01 level of significance (speakers Re, Ra and Ba). See Table 2 for measurement points in time.

7

Page 39: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

The EPG data for speakers Re and Ra showclear similarities with the EPG data onanticipatory effects displayed in Figure 7. Overall,C13[1] allows lesser effects over time than C1=[t].This pattern is not so clear for speaker Ba. Dataacross speakers show that the offset time of theV1-dependent carryover effects is not asconstrained as the onset time of the V3-dependentanticipatory effects; while V1-dependent effectsacross C1=[1] may extend into [a] (and evenfurther), those across C1=[t] may stop at Cl. Also,in contrast with anticipatory effects in Figure 7,long range temporal effects are available; thus,V1-dependent carryover effects may last until C2(mainly when C1=[t], as expected) and,occasionally, until V3.

The F2 data reveal a general trend for carryovercoarticulatory effects to extend into [a], more sowhen C1=[t] than when C1=[1] for speakers Reand Ra but not for speaker Ba. No V1-to-V3 effectswere found e-fcept in two instances for speaker Re.The fact that these two cases occur in contextualenvironments which were judged to be highlyresistant to carryover coarticulation (i.e.,sentences MON and Ware]) suggests thatadditional explanations are needed.

A bet.ter account of the carryover effectsreported in Figure 8 results from a more detailedanalysis of the articulatory constralits involvedduring the production of the VCVCV sequenceswith C1=[r]. The figure shows a consistent trendfor the utterances [i/alati] and [i/aratal to allowlesser coarticulation than the utterances [i/alari]and Waraj'a]; as pointed out above, this is also thecase for the F2 data with regard to speaker Re.The pattern of carryover effects for the utterancesWad] and Warata] conforms to the pattern of V3anticipation across C2=[j] (Figure 7) in thatcoarticulation does not reach V2=[0]. The fact thatthis pattern does not apply entirely to thesequences [AM] and Warala], and that moreoverV1-dependent carryover effects may last longerthan for the sequences with C1=[t], suggests theexistence of a particular production strategy.

To investigate this issue I plotted the amount ofdorsopalatal contact ever time for the sequences[MN] vs. the sequences [VratV] (speakers Raand Ba) in Figure 9. As shown in the figure, V1-dependent effects from [i] vs. [a] take place duringV2=[a] when C2=[j] but not when C2=[t]. For thesequences [VjatV], an active productionmechanism is required for th3 achievement ofV2=[a]. As compared to adjacent Cl =[r ] andC2=[t], the production of this vowel involves a

noticeable decrease in degree of dorsopalatalcontact; moreover, the linguopalatal target for [a]is highly independent of differences in the qualityof Vi. Tho production of the sequences [ViarV] ischaracterized by a different strategy. Noarticulatory target for the schwa is available inthis case; instead, tongue dorsum contact proceedsgradually from Cl to C2 through the vowel. Thedegree of linguopalatal contact stays high throughthe entire utterance for the [VIV] sequences, andincreases from V1 to C2 in the [aINTIV] sequences.It can be concluded that the articulatorycharacteristics of C2 have a clear effect on the V1-dependent coarticulatory trends in VCVCVsequences with C14"1.

The EPG data in Figure 8 iaveals that the C2V3string may affect the Vl-dependent carryovereffects in other instances. According to the initialhypothesis, the sequences [VtaTV] show smallereffecti (mostly until Cl or V2) than the sequences[VtztV] (mostly until V2 or C2), clearly so forspeaker Ra and, less so, for speakers Re and Ba.Figure 10 illustrates this point with data on thedegree f...f dorsopalatal contact over time for thesesequences according to speaker Ra. V1-dependenteffects stop at the onset of V2:2[a] when C241],but extend across [a] into C2 when C2=[t].Therefore, a highly constrained C2 (i.e., [S]) blocksV1-dependent coarticulatory effects and a C2which is unspecified for tongue dorsum contactallow these effects to take place over a long periodof time.

In summary, as for anticipatory effects,carryover effects from V1 appear to be largelydependent on the degree of palatal contact for theadjacent phonetic segment (i e., C1). Differentlyfrom anticipatory effects, the offset of carryovercoarticulation is less constrained in time, mayextend beyond V2=[0], and is dependent on thearticulatory characteristics of the phonemic stringon the other side of [0].

2. Consonantal effects2a. Anticipatory coarticulation

Figure 11 displays EPG and F2 data oncoarticulatory effects caused by C2 along thepreceding VCV string for all speakers. Thesequences have been ordered for decreasingdegrees of resistance to anticipatory effectsassociated with C2, as explained in section ILA.

According to the figure,' C2-dependentanticipatory effects over time for speaker Re occurmore frequently for sequences ending in V3=[a]than for those ending in V3=[il.

)4

...ir)

elp,

Page 40: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

26

III

-- 11 311 - -- ..- - - - 030 ........« Ijaja --- Oat*

4:,:-..-------e'r- %

Ir::' ..° .-! ------- '. ....

------- -- ----s

17...' i

. .

3 4 I a

1111martmentasints

... s i I ra.2.12-tti

I

I

i

Page 41: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

32

20

10

' :tali 4"-- hare"'1" atoji ..- '' stela

II dais0 mats'

10_

0 1 2 3 4 5 a 7

Massatmentonlais

4--6 9 10

Figure 10. Vi-dependent carryover effects in tongue dorsum contact over time for the sequences [VtafV1 vs. IlitotV1according to speaker Ra. See Table 2 for measurement points in time.

The figure also shows that the EPG data forC2=(11 and (t] before V3=(i] for the same speakerare not significantly different at the C2 midpoint(i.e., point in time 7). Anticipatory effects fromC2=ff] vs. (t] occur mainly when V3=(a] sincethere are clear differences in degree of tonguedersum raising for the consonant in this context(see Figure 4); consonant-dependent anticipatory

effects are small or non existent when V3..(i] sincethis vowel causes the dorsum of the tongue for [t]to show a similar amount of raising to that for [1](see Figure 4). This finding shows that theavailability and extent of anticipatorycoarticulation is dependent on the particular stateof the articulators during the production of thetarget phoneme.

40

Page 42: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Long Range Coarticulatory Effects 33

Overall, for all three speakers, a strongtendency is obi erved for C2-dependent effects tobegin as early as possible during V2-,[o] but rarelyduring the phonetic segments preceding V240].Effects do not appear to be dependent on thearticulatory characteristics of Cl; thus, forexample, the onset time of the C2-dependentanticipatory effects does not occu later whenClain than when Cl =[t]. Data for speaker Ba looksomewhat different from data for speakers Re andRa in this respect; in this case, more facilitation ofthe C2-dependent anticipatory effects may take

'place when Vi =[a] than when V1::(i).The F2 data show no C2-dependent effects

before V2=[s]. In general, those utterancesshowing little coarticulation in linguopalatalcontact allow no C2- to-[o] effects in F2 frequency.

In summary, as for V2-dependent anticipatoryeffects, C2-dependent anticipatory effects areconstrained to occur at the onset of V242], more

F2 lots

so for speakers Re and Ra than for speaker Ba.Overall, they also are highly independent of thearticulatory characteristics of distant segments inthe string.

2b. Carryover coarticulationFigure 12 displays EPG and F2 data on

coarticulatory effects caused by C1 along thefollowing VCV string for all speakers. Thesequences in tho figure have been ordered fordecreasing degrees of resistance to carryovereffects associated with Cl, as explained in sectionIILAnalogously to the data in Figure 11, the EPGdata at the C1 midpoint show significantdifferences between In and [t] after V1=[a] but notafter V1=[i], for speakers Ra and I .1 and, to alesser extent, fof speaker Re. As expected, themost reliable Cl-dependent effects over time occurwhen vi=tal.

F22-611 12.21111LUI

ValidNit,alsjAi

4VAitePtiitaVia

ailijiti

.1011/1a

IN at MI M a..4IP1111M2

ilal M III in

111 ..,;NA}

' maca/Ai

*NitaWA;itapiaatil/tiNock:

inri t

5....19E13...0

mM in

. . ,., N . .

italAiNita'NM*Maitaf/ii

itsPiavalidaraftia

a V . 5

n 11111-13-JL13 r1 El II

Je ri ri ri )19 re nn n_AL a_ a.O. rl in nal

12 El el la o

1 2 3 1 2 3 4 ti 6 7

Measurement mints

Fig ,e 11. EPG and P2 data on significant C2-dependent anticipatory effects along the entire Vaal utterance at thep < .01 level of significance (speakers Re, Ra and Ba). See Table 2 for measurement points in time.

r .

Page 43: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

34 Recasens

C1

contrast11/14i

11/0/aajhaji

41/1aja11/1ad

Oats

allist141/121A

to

181519

112

In 19 M 113

83 111

131

13 19 al 191

=11111111211LM

MIM

=M

IP

911111193111B

EE

,

Wu*

1/afaalfte/i

aVtara

Pala

IOW

slime

ijhajiilhaja

*NO

alhalaaV

iata

Luray-

imiw

-m-m

-rse-m 4 5 3 4 5 ti 7 8 9

NI

Meastnernant

point'

Figure

12.

EPG

and

F2 data

on

significant

C1-dependent

carryover

effects

along

the

entire

(AC

V

utteunce

atthe p < .01level

of significance

(speakers

R Re

and

Ba).

See

Table

2 for

measurem

ent

points

in time.

According

to the

figure,

the

offset

time

of the

carryover

effects

appears

to be affected

by the

articulatory

characteristics

cf C2;

thus,

the

EPG

data

(all

speakers)

and

the

F2 data

(speakers

Reand

Ra)

show

a general

tendency

for

coarticulation

to last

longer

in the sequences(aCotV

]

than

in the sequences

(aConr].

Long

range

effects

from

Cl are

more

salient

than

thosefrom

C2

in Figure

11;

accordingly,

the

EPG

andthe

F2 data

in Figure

12 show

instances

of Cl-to-C

2

and

C1 -to-V

3

toarticulation,

mostly

whenC

2=[t]

and

V3=

[a],

ft,

expected.In summ

ary,

with

regari

to vowel-dependent

carryover

vs.

anticipatory

effects,

consonant-dependent

carryover

effects

are

more

variable,may

last

longer

and are

more

affected

by distant

phonetic

segments

than

consonant-dependentanticipatory

effects.

IV.

DISC

USSIO

N

AN

D

CO

NC

LU

SION

SIt was

found

in this paper

that

the

amount

of

coarticulation

in tongue

dorsvm

contact

varies

inversely

to the

degree

of palatal

contact

in

adjacent

or

intervening

segments.

Data

reportedin the

Results

section

(III.A.1

and

III.A.2)

showlarger

variability

in tongue-dorsum

contact

and

in

F2 frequency

for

(t] vs.

[j] as a function

of vocalic

context,

and

for

(i] vs.

[a] as a function

of

consonantal

context.

Data

on

long

range

coarticulatory

effects

show

that

the

onset

of V-

..n3

C- dependent

anticipatorycoarticulation

is quite

fixed

in time,

is conditioned

42

Page 44: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Long Range Coarticalatory Effects

by the degree of tongue dorsum contact for theimmediately preceding segment in the string (i.e.,C2), and is speaker-dependent. Thus, the onset ofthe V3-dependent effects occurs during C2-41, butduring preceding V2-a[e] if C =[t] (two speakers);for these same speakers the onset of theconsonant-dependent effects takes place at theonset of V2=N. Speaker Ba, on the other hand,shows no V3-dependent effects, and C2-dependent effects originating at the onset of [s].

As compared wit. anticipatory effects, longrange carryover effects from VI and C2 werefound to be quite more variable (i.e., offset timeswere not as regular as onset times for anticipatoryeffects) and to extend further in time (on C2 andV3) when distant context allowed so. Moreover,carryover effects were more sensitive thananticipatory effects to the degree of articulatoryconstraint for the contextual gestures. As foranticipatory coarticulation effects, they wereconditioned by the degree of tongue dorsumcontact for the adjacent segment (e.g., offset ofvowel-dependent coarticulation occurs earlierwhen C1 =[j] than when C1=[t]). However,differently from anticipatory effects, distantsegments were found to counteract or facilitatecarryover coarticulation; thus, the V1-dependenteffects last longer in sequences [VtatV] vs. (Vtafinand (Wen] vs. [VrotV], and the offset of the C1-dependent effects takes place earlier when C2=tj]than when C2_ [t].

These data reveal that anticipatory andcarryover effects are highly sensitive to the degreeof dorsopalatal constriction exhibited by theadjacent phonetic segments in the string.Moreover, they suggest that, to a large extent, thedegree of spreading of gestural activity over timemay depend on some measure of degree ofgestural antagonism in the adjacent and/ordistant phonetic segments; it may be that thetemporal domain of a gesture increases ordecreases as the contextual gestures become moreor less neutral, respectively, wish respect to itsproduction properties. Research reported in thisstudy and in the recent literature (seeIntroduction section) provides good evidence forthe notion of degree of gestural antagonism in thecase of tongue dorsum raising towards the palate.Thus, the onset of gestural activity appears totake place earlier as the degree of gestural conflictfor the preceding consonants decreases, forin>E1>[p]. Also, V1-to-V3 or V3-to-V1 effects inVCVCV sequences have been reported to occuronly when V2 is highly neutral with respect totongue body coarticulation (i.e., (]); consistently

with this view, V-to-V effects across labial anddentoalveolar consonants in VCV sequences maynot take place when the fixed vowel is not [s] (seesection I.0 for references).

This issue of articulatory antagonism is crucialto elucidate the temporal domain of thecoarticulatory effects (and of phonemic gestures ingeneral), and the relative dominance ofanticipatory vs. carryover coarticulation. In myview these two aspects are related as follows: themore the conte.dual gestures approach absoluteneutrality with respect to the target gesture, themore anticipatory effects prevail over carryovereffects. This situation takes place when largeamounts of undershoot are available (e.g., in fastspeech) and/or when preceding segments canadapt easily to the target gesture (e.g., to tonguepositioning for an upcoming vowel during theproduction of one or more labial consonants). Insuch extreme circumst_nces anticipatorycoarticulation may extend more than twosegments in advance; thus, as mentioned in theIntroduction section, Magen (1989) found V3-to-V1 effects in [VbobV] sequences. Otherwise, as forIt/ in the present study (also for /l/ in Huffman,1986) and at slow speech rates (probably in thecase of speaker Ba, whose speech was the slowestof all speakers analyzed in this study), carryovereffects extend over larger deriods of time thananticipatory effects. Thus, in [VtetV) sequences,V- and C-dependent carryover effects were foundto extend up to three (and to a lesser extent four)segments from the target and to block possible V-end C- dependent anticipatory effects before V2. Isuggest that this is the case because [t] is notcompletely neutral with respect to tongue dorsumactivity; indeed, the raising of the front of thetongue also involves some raising of the tonguedorsum.

While differences in the temporal extent ofcoarticulatory effects differing in directionalitymay be shown to depend on the degree ofarticulatory constraint for the adjacent gestures, adifferent account is needed to explain why theonset of anticipatory coarticulation is more fixedthan the offset of carryover coarticulation. Thus,for example, no requirements on articulatorycontrol may be called forth to explain whycarryover effects are more dependent thananticipatory effects upon the articulatorycharacteristics of the distant phonemes in thestring. Again, even though the articulatoryconfiguration of V2=[e] in the utterances of thepresent study is roughly equally affected by Cland C2, the extent of V1-dependent co:articulation

43

Page 45: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

36

is affected by the articulatory properties of C2, butthe extent of V3-dependent coarticulation is notaffected by the articulatory properties of Cl. Anobvious interpretation of this finding is that, whilecarryover effects may adapt to the articulatoryrequirements imposed upon the realization ofdistant phonetic segments, anticipatory effects donot do so. Instead, the onset of the anticipatoryeffects is required to occur at a specific moment intime or during the production of a given precedingphoneme. Together with data from other sources(Magen, 1989; Parush et al., 1983; Recasens,1987) this may be the sort of regularity that weare looking for to state that anticipatory effectsreflect motor planning to a larger extent thancarryover effects.

The findings reported in this paper providesome support for a time-locked model ofanticipatory coarticulation. Thus, the onset of V3-dependent effects across non-antagonistic C2=[t]was always found to occur at V2=[a], and the onsetof C2-dependent effects across non-antagonisticV2 =[o] was mostly detected at the onset of V240].Clearly, more work is needed to determine howthe onset time of articulatory activity for a givengesture changes ?is a function of the degree ofgestural antagonism associated with thepreceding phonemes in the string.

Some data reported in this and ethercoarticulation studies are relevant to Ohman's V-to-V model of coarticulation. Firstly, V-to-V effectsbarely take place if the intervening comment ishighly constrained and/or if the fixedtransconsonantal vowel is not highly neutral withrespect to the target vowel gesture (Recasens,1987). Moreover, contrasting speaker-dependentbehaviors may also be available; for example, itwas found in the present study that V3- dependantanticipatory effects for speaker Ba did not reachthe immediately preceding consonant (Le., C2).Thirdly, as stated earlier, the temporal domain ofanticipatory coarticulation may exceed twosegments, if the contextual gestures are highlyneutral with respect to the target gesture;moreover, the particular nature of the carryovereffects also facilitates long range effects beyondV2 =[a]. Finally, as shown in figures 8 and 12 ofthis paper, V- and C- dependent carryover effectsmay land on a consonant (i.e., C2) showing a lowdegree of articulatory constraint. In agreementwith the hypothesis of a V-to-V mode ofproduction, in most cases V-dependent effectswere found to exceed C-dependent effects. Thus,for two speakers, no C2-to-C1 anticipatory effects

(even when C1 =[t]) were available but only V3-to-V2 effects; of course, it can be claimed that [a] ismore neutral than [t] with regare coarticulationin tongue dorsum activity, and that C2-to-C1effects might have been found had Cl been [IAinstead of [t]. For speaker Ba, however, while C2-to -[e] effects occur systematically, no V3-dependent effects were found either upon V2=[a]or even upon CU[t). It can be concluded that theavailability of a V-to-V mode of production mayvary in view of the strong and subtle dependenceof coarticulation on gestural antagonism in thecontextual phonemes.

REFERENCESAlfa= P. J., st Baer, T. (1982). Dynamics of vowel articulation.

Iingusge end Speed42.5, 151-174.lielt-Berti, F., & Hants, K. S. (1981). A temporal model of speech

production. Phonetics, 38, 9-20.Butcher, A., & Water, IL (1976). An electropalatographic

investigation of couticulation in VCV sequences. Journal ofPhan/1513,4, 59-74.

Carney, P. J., & Moll, K. L (1971). A cinefluorographicinvestigation of fricative consonant-vowel coarticulation.Phatetice,23, 193402.

Gifford, I. (1977). Fundamental problems in *nitric& Bloomington:Indiana University Press.

Font, G. (1960). Acoustic dam of speech production. s'GravenhagcMcutcn.

Farnetani, E., Vaggin, K., & MagnoCaldognetto, E.Coarlicularion in Italian /VtV/ sequences: A palatograpoicstudy. Phonetics, 42, 78-99.

Fowles, C. A. (1981). Production and perception of coarticulationamong stressed and unstressed vowel& journal of Speech andHearing Rama, 46,127 -139.

Fowler, C. A. (1984). Current perspectives on languor ge andspeech production: A aiticai overview. In R. Dan** (Ed.),Recast advances in speech, baying an4 la Burge (VoL 3, pp. 193-218). San Diego: College Hill Press.

Gay, T. (19M. Gnefluorographic and electromyographic studiesof a_ ''culatory orgar,:zztyt. In M. Sawashinta & F. S. Cooper(Eds.), Dynama aspects of speech production (pp. 85-102). Tokyo:University as Tokyo Press.

Harris, K. r. (1984). Coarticulation as a component in articolatoryicriphon. In R. C. Daniloff (Ed.), Artindatory assessment and

tresinnit issues (pp. 147-16"). San Diego: College Hill Pres.Hanks, W. L. (1966). Dyrusnic artiadatory maid of speech production

using ammeter siondation. Doctoral dissertation, Massachusetts!stibnite of Technology.

Huffman, M. K. (1986). Patterns of coarticulation in English.UCLA Working Paws in Phonetics, 63, 26 47.

Macbleilage, P., & DeClerk, J. L (1969). On the motor control ofcoarticulation in CVC monosyllables. Journal of the AcousticalSociety of Antrim, 45,1217 -1233.

McCuttheon, M., Hasegawa, A., & Fletcher, S. (1980). Effects of=morphology on /1, z/ articulation. BiocommuniastionReport, University of Alabama, Birmingham, 3, 38-47.

Map& H. (1989). An acoustic study of V-to-V /inaction inEnglish. Doctoral dissertation, Yale University.

Ohioan, S. E. (!366). Coarticulation in VCV sequences:Spectrographic measurement& Journal of the Acoustical Society ofAntrim, 39,151 -168.

Parush, A., Ostry, D. J., & Munhall, K C. (1983). A kinematicstudy of lingual coarticulation in VCV sequences. Journal of theAcoustical Society of America, 74, 1115.1125.

44

Page 46: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Long Range Coartiaglatory Effects 37

Perkeil, J., & Cohen, M. (1987). Token-by-token variation oftongue-body vowel targets: The effect of coarticulation. Journalof A: Acoustical Society of America, 82, S17. (Abstract)

Recasens, D. (1984). Vowel-to-vowel coarticulation in CatalanVCV sequences. Journal of the Acoustical Society of Merits, 76,1624-1635.

Rowena, D. (1985). Coartiadatory patterns and degrees ofcoerticulatay resistance in Catalan CV sequences. Language andSpaeth, 28, 97-114.

Reasens, D. ( 1987). An acoustical analysis of V-to-C and V-to-Vcoarticulatary effects in Catalan and Spanish VCV sequences./am& of Phonetics,15, 299-312.

Shibata, S., Ina, A., Yamashita, S., H&J, S., Kiritani, S., &Sawachima, M. (1978). A new portable unit fordectropalatography. Annual Bulktin of the Institute of logopedicsand Phonistria, University of Tokyo, 12, 5-10.

Sussman, H. M., MacNeilage, P. F., & Hanson, R. J. (1973). Labialawl mandibular mechanics during the production of bilabialstop consonants. Journal of Speech and Having Reseisch,16, 397-420.

Sussman, H. M., & Westbury, J. R. (1981). The effects ofantagonistic gestures on temporal and amplitude parameters ofanticipatory labial coa.-tladation. Journal of Speech and HearingReseard,46,16-24.

Wolf, M., Fletcher, S., McCutcheon, M, & Hasegawa, A. (1976).Medial groove width during /s/ sound production.Bioconnnunication Research Reports, University of Alabama,Birmingham, 1, 57-66.

FOOTNOTES'Submitted to Speech CaPUNIOliallial.tAlso Univasitat Autenoma de Barcelona.

Page 47: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Raskitut Laboratories Status Re,-,ot on Speech Research1989, SR-991100, 3848

A Dynamical Approach to Gestural Patterning in SpeechProduction*

Elliot L Saltzman and Kevin G. Munliallt

In this article, we attempt to reconcile the linguistic hypothesis that speech involves anunderlying sequencing of abstract, discrete, context-independent units, with the empiricalobservation of continuous, context - dependent interleaving of articulatory movements. Tothis end, we first review a previously proposed task-dynamic model for the coordinationand control of the speech articulators. We then describe an extension of this model inwhich invariant cap ma units (gestural primitives) are identified with context-independentsets of parameters in a dynamical system having two funcUonally distinct but interactinglevels. The intergestural level is defined ng to a set of activation coordinates; theinterarticulator level is defined according to bo th model articulator and tract-variablecoordinates. In the framework of this extended model, coproduction effects in speech aredescribed in terms of the blending dynamics defined among a set citar.,,orally overlappingactive units; the relative timing of speech gestures is formulated in terms of the serialdynamics that shape the temporal patterning of onsets and offsets in unit activations.Implications of this approach for certain phonological issues are discussed, and a range ofrelevant experimental data on speech and limb motor control is reviewed.

INTRODUCTIONThe production of speech is portrayed

traditionally as a combinatorial proms that usesa 'irked set of units to produce a very largenumber of linguistically "well-formed" utterances(e.g., Chomsky & Halle, 1968). For example, /mad/and /them/ are characterized by differentunderlying sequences of the hypothesizedsegmental units /m/, /d/, and he/. These types ofspeech units are usually seen as discrete, static,and invariant across a variety of contexts.

We acknowledge grant support from the following sources:NIH Grant NS-13617 (Dynamics of Speech Articulation) andNSF Grant DNS-8520709 (Phonetic Structure UsingArticulatory Dynamics) to Haskins Laboratories, and grantsQum the Natural Science and Engineenng Research Council ofCanada and the Ontario Ministry of Health to Kevin G.Munhall. We also thank Philip Rubin for assistance with thepreparation of the Figures in this article, and Nancy O'Brienand Cathy Alfandre for their help in compiling this article'sreference section. Finally, we are.grateful for the critical andhelpful reviews provided by Michael Jordan, Bruce Kay,Edward Reed, and Philip Rubin.

Putatively, such characteristics allow speechproduction to be gc -erative, because units of thislOnd can be concatenated easily in any order toform new strings. The reality of articulation,however, bears little resemblance to thisdepiction. During speech production, the shape ofthe vocal tract changes constantly over time.These changes in shape are produced by themovements of a number of relatively independentarticulatcrs (e.g., velum, tongue, lips, jaw, etc.).For example, Figure 1 (from Krakow, 1987)displays the vertical movements of the lower lip,jaw and velum for the utterance "it's a / bamib/ed.' The lower lip and jaw cooperate toalternately close and open the mouth during/bamib/ while, simultaneously, the velumalternates between a closed and open posture. It isclear from this figure that t'- ) articulatorypatterns do not take the form of discrete, abuttingunits that are concatenated like beads on a string.Rather, the movements of different articulatorsare interleaved into a continuous gestural flow.Note, for example, that velic lowering for the /m/begins even before the lip and jaw complete thebilabial opening from the /b/ to the /a/.

38 46

Page 48: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

A Dynamical Approach to Gestural Patterning in Speech Produciion 39

/ b a m I b /

ACOUSTICS "44111*-401.6."-41110#441411111111r041111101011.101111111111

VELUM

LOWER LIP

JAW

mmmmmmmmm sow a eleak:ILe ammo* .. mmmmmmm La No a somo

Figure 1. Acoustic waveform and optoelectronically monitored vertical components of the articulatory trajectoriesaccompanying the utterance 'It's a /bmib/ side. (From Krakow, 1987; used with author's permission).

In this article, we focus on the patterning ofspeech gestures,1 drawing on recent developmentsin experimental phonology/phonetics and in thestudy of coordinated behavior patterns in multi-degree-of-freedom dynamical systems. Our keyquestions are the following: How can one bestreconcile traditional linguistic analyses (discrete,context-independent units) with experimentalobservations of speech articulation and acoustics(continuous, context-dependent flows)? How canone reconcile the hypothesis of underlyinginvariance with the reality of surface variability?We try to answer these questions by detailing aspecific dynamical model of articulation. Our focuson dynamical systems derives from the fact thatsuch systems offer a theoretically unified accountof: a) the kinematic f -ms or patterns displayed bythe articulators during speech; b) the stability ofthese forms to external perturbations; and c) thelawful warping of these forms due to changingsystem constraints such as speaking rate,casualness, segmental composition, orsuprasegmental stress. For us the primaryimportance of the work lies not so much in thedetails of this model, but in the problems that canbe delineated within its framework.2 It hasbecome clear that a complete answer to thesequestions will have to address (at least) thefollowing: 1) the nature of the gestural units orprimitives themselves; 2) the articulatoryconsequences of partial or total temporal overlap(coproduction) in the activities of these units that

results from gestural interleaving; and 3) theserial coupling among gestural primitives, i.e.. theprocesses that govern intergestural relativetiming and that provide intergestural cohesion forhigher-order, multigesture units.

Our central thesis is that the spatiotemporalpatterns of speech emerge as behaviors implicit ina dynamical system with two functionally distinctbut interacting levels. The intergestural level isdefined according to a set of activationcoordinates; the interarticulator level is definedaccording to both model articulator and tract-variable coordinates (see Figure 2). Invariantgestural units are posited in the form of relationsbetween particular subsets of these coordinatesand sets of context-independent dynamicalparameters (e.g., target position and stiffness).Contextually-conditioned variability acrossdifferent utterances results from the manner inwhich the influences of gestural units associatedwith these utterances are gated and blended intoongoing processes of articulatory control andcoordination. The activation coordinate of eachunit can be interpreted as the strength with whichthe associated gesture "attempts* to shape vocaltract movements at any given point in time. Thetract-variable and model articulator coordinates ofeach unit specify the particular vocal-tractconstriction (e.g., bilabial) and set of articulators(e.g., lips and jaw) whose behaviors are directlyaffected by the associated unit's activation. Theintergestural level accounts for patterns of

t

Page 49: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

40 Saltzman and Matadi

relative timing and cohesion among the activationintervals of gestural units that participate in agiven utterance, e.g., the activation intervals fortongue-dorsum and bilabial gestures in a vowel-bilabial-vowel sequence. The interarticulator levelaccounts for the coordination among articulatorsrmident at a given point in time due to thecurrently active ., of gestures, e.g., thecoordination among lips, jaw, and tongue duringperiods of vocalic and bilabiP1 gesturalcoproduction.3

L.

intergesturalCoordination

(gestural activation variables)

interarticulatoryCoordination

(tract variables;model articulatory variables)

Figure 2. Schematic illustration of the proposed two-leveldynamical model for t, production, with emaciatedcoordinate systems it.sted. The darker arrow from theintergestural to the interarticulator level denotes thefeedforward flow of gestural activation. The lighterarrow indicates feedback of ongoing tract-variable andmodel articulator state Information to the intergesturallevel.

In the following pages we take a stepwiseapproach to elaborating upon these ideas. First,we examine the hypothesis that the formation andrelease of local constrictions in vocal tract shapeare governed by active gestural units that serve toorganize the articulators temporarily and flexiblyinto functional groups or ensembles of joints andmuscles (i.e., synergies) that can accomplishparticular gestural goals. Second, we review arecent, promising extension of this approach to therelated phenomena of coarticulation andcoproduction (Saltzman., Rubin, Goldstein, &

Browman, 1987). Third, we describe some recentwork in a connectionist, computational framework(e.g., Grossberg, 1986; Jordan, 1986, in press;Lapedes & Farber, cited in Lapedes & Farber,1986; Rumelhart, Hinton, & Williams, 1986) thatoffers a dynamical account of intergestural timing.Fourth, we examine the issue of intergesturalcohesion and the relationships that may existbetween stable multiunit ensembles and thetraditional linguistic concept of phonologicalsegments. In doing so, we review the work ofBrowman and Goldstein (1986) on theirarticulatory phonology. Fifth, and finally, wereview the influences of factors such as speakingrate and segmental compo: ition on gesturalpatterning, and speculate on the furtherimplications of our approach for understandingthe production of speech.

Gestural primitives for speech:A dynamical framework

Much theoretical and empirical evidence fromthe study of skilled movements of the limbs andspeech articulators supports the hypothesis thatthe 'significant informational units of action"(Greene, 1971, p. xviii) do not entail rigid or hard-wired control of joint and/or muscle variables.Rather, these unit, ,, coordinative structures (e.g.,Fowler, 1977; Kugler, Kelso & Turvey, 1980, 1982;Kugler & Turvey, 1987; Saltzman, 1W; Saltzman& Kelso, 1987; Turvey, 1977) must be definedabstractly or functionally in a task-specific,flexible manner. Coordinative structures havebeen conceptualized within the theoretical andempirical framework provided by the field of(dissipative) nonlinear dynamics (e.g., Abraham &Shaw, 1982, 1986; Guckenheimer & Holmes, 1983;Haken, 1983; Thompson & Stewart, 1986;Winfree, 1980). Specifically, it has beenhypothesized (e.g., Kugler et al., 1980; Saltzman &Kelso, 1987; Turvey, 1977) that coordinativestructures be defined as task-specific andautonomous (time-invariant) dynamical systemsthat underlie an action's form as well as itsstability properties. These attributes of task-specific flexibility, functional definition, and time-invariant dynamics have been incorporated into atask-dynamic model of coordinative structures(Kelso, Saltzman & Tuller, 1986a, 1986b;Saltzman, 1986; Saltzman & Kelso, 1987;Saltzman et al., 1987). In the model, time-invariant dynamical systems for specific skilledactions are defined at an abstract (task space)level of system description. These invariantdynamics underlie and give rise to contextually-

48

Page 50: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

A Dynamical Approach to Gestural Patternint :11 Speech Production 41

dependent patterns of change in the dynamicparameters at the articulatory level, and hence tocontextually-varying patterns of articulatortt,jectories. Qualitative differences between thestable kinematic forms required by different tasksare captured by corresponding topologicaldistinctions among task-space attractors (see alsoArbil, 1984, for a related discussion of the relationbetween task and controller structures). Asapplied to limb control, for example, gesturesinvolving a hand's discrete motion to a singlespatisi target and repetitive cyclic motion betweentwo such targets are characterized by time-invariant point attractors (e.g., as with a dampedpendulum or damped mass-spring, whose motiondecays over time to a stable equilibrium point)and periodic attractors (limit cycles; e.g., as withan escapement-driven pendulum in a grandfatherclock, whose motion settles over time to a stableoscillatory cycle), respectively.

Model articulator and tract variablecoordinates

In speech, a major task for the articulators is tocreate and release constrictions locally in differentregions of the vocal tract, e.g., at the lips forbilabial consonants, or between the tonguedorsum and palate for some vowels.4 In task-dynamics, constrictions in the vocal tract aregoverned by a dynamical system defined at theinterarticulator level (Figure 2) according to bothtract variable (e.g., bilabial aperture) an I modelarticulator (e.g., lips and jaw) coirdinates. Tractvariables are the coordinates in which context-independent gestural *intents" are framed, andmodel articulators are the coordinates in whichcontext-dependent gestural performances areexpressed. The distinction between tract-variablesand model articulators reflects a behavioraldistinction evident in speech production. Forexample, in a vowel-bilabial-vowel sequence agiven degree of effective bilabial closure may beachieved with a range rf different lip-jawmovements that reflects contextual differences inthe identities of the flanking vowels (e.g.,Sussman, MacNeilage, & Hanson, 1973).

In task-dynamic simulations, each constrictiontype (e.g., bilabial) is associated with a pair(typically) of tract variables, one that refers to thelocation of the constriction along the longitudinalaxis of the vocal tract, and one that refers to thedegree of constriction measured perpendicularly tothe longitudinal axis in the sagittal plane.Furthermore, each gestural/constriction type isassociated with a particular subset of model

articulators. These simulations have beenimplemented using the Haskins Laboratoriessoftware articulatory synthesizer (Rubin, Baer &Mermelstein, 1981). The synthesizer is based on amidsagittal view of the vocal tract and a simplifiedkinematic description of the vocal tract'sarticulatory geometry. Modeling work has beenperformed in cooperation with several of rlircolleagues at Haskins Laboratories as part of anongoing project focused on the development of agesturally-based, computational model oflinguistic structures (Browman & Goldstein, 1986,in press; Brownian, Goldstein, Kelso, Rubin, &Saltzman, 1984; Browman, Goldstein, Saltzman,& Smith, 1986; Kelso et id., 198C.., 1986b; Kelso,Vatikiotis-Bateson, Saltzman, & Kay, 1985;Saltzman, 1986; Saltzman et al., 1987).

Figures 3 and 4 illustrate the tract variablesand articulatory degrees-of-freedom that are thefocus of this article. In the present model, they areassociated with the control of bilabial, tongue-dorsum, and 'lower- tooth - height" constrictions.6Bilabial r-Atures are specified according to thetract vs. bles of lip protrusion (LP; thehorizontal distance of the upper and lower lips tothe upper and lower front teeth, respectively) andlip aperture (LA; the vertical distance between thelips). For bilabial gestures the four modeledarticulatory components are: yoked horizontalmovements of the upper and lower lips (LH), jawangle (JA), and independent vertical motions ofthe upper lip (ULV) and lower lip (LLV) relative tothe upper and lower front teeth, respectively.Tongue-dorsum gestures are specified according tothe tract variables of tongue-dorsum constrictionlocation (TDCL) and constriction degree (TDCD).These tract variables are defined as functions ofthe current locations in head-centered coordinatesof the region of maximum constriction betweenthe tongue-body surface and the upper and backwalls of the vocal tract. The articulator set fortongue-dorsum gestures has three components:tongue body radial (rn) and angular (TBA)positions relative to the jaw's rotation axis, andjaw angle (JA). Lower-tooth-height gestures arespecified according to a single tract variabledefined by the vertical position of the lower frontteeth, or equivalently, the vertical distancebetween the upper and lower front teeth. Itsarticulator set is simply jaw angle (JA). This tractvariable is not used in most current simulations,but was included in the model to test hypothesesconcerning suprasegmental control of the jaw(Macchi, 1985), the role of lower tooth height intongue blade fricatives, etc.

Page 51: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

42 Salizman and Munhall

Figure 3. Schematic midsagittal vocal tract outline, with tract-variable de ees of freedom indicated by arrows. (see textfor definitions of tract-variable abbreviations used).

TRACTMODEL ARTICULATOR VARIABLES

ipio)

VARIABLES:i; i - 1. 2.....an 1174

LH(0 i)

JA102)

ULV(03)

LLV(04)

TBR(0 5)

TBA(06)

17R(0 4

TTA(0 8)

V(O9)

G010)

LP (Z i)

LA (Z2)

TDCL (Z3)

TDCD (Z4) 6.

LTH (Zs)

TTCL (Z6)

TICD (Z 7)

VEL (Z)

GLO (Z9)

a

Figure 4. Matrix representing the relationship between tract-variables (z) and model articulators (a). The filled cells in agiven tract-variable row denote the model articulator components of that t ct-variable's articulatory set. The emptycells indicate that the corresponding articulators do not contribute to the tract-variable's motion. (See text fordefinitions of abbreviations used in the figure.)

Page 52: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

A Dynamical Approach to Gestural Patterning in Speech Production 43

Each gesture in a simulated utterance isassociated with a corresponding tract-variabledynamical system. At present, all such dynamicalsystems are defined as tract-variable point-attractors, i.e., each is modeled currently by adamped, second-order linear differential equation(analogous to a damped mass-spring). Thecorresponding set of tract-variable motionequations is described in Appendix 1. Theseequations are used to specify a functionallyequivalent dynamical system expressed in themodel articulator coordinates of the Haskinsarticulatory synthesizer. This model articulatordynamical system is used to generate articulatorymotion patterns. It is derived by transforming thetract-variable motion equations into anarticulatory space whose components havegeometric attriautes (size, shape) but aremassless. In other wor.is, this transformation is astrictly kinematic one, and involves only thesubstitution of variables defined in one coordinatesystem for variables defined in another coordinatesystem (see Appendix 2).

Using the model articulator dynamical system(Equation (A4] in Appendix 2) to simulate simpleutterances, the tusk-dynamic model has been ableto generate many important aspects of naturalarticulation. For example, the model has beenused to reproduce experimental data oncompensatory articulation, whereby the speechsystem quickly and automatically reorganizesitself when faced with unexpected mechanicalperturbations (e.g., Abbs & Gracco, 1983; Folkins& Abbs, 1975; Kelso, Tuller, Vataltiotis-Bateson &Fowler, 1984; Munhall & Kelso, 1985; Munhall,Lefqvist & Kelso, 1986; Shaiman & Abbs, 1987) orwith static mechanical alterations of vocal tractshape (e.g., Gay, Lincilmom, & Lubker, 1981;MacNeilage, 1970). such compensation formechanical disturbances is achieved byreadjusting activity over an entire subset ofarticulators in a gesturally-specific manner. Thetask-dynamic model has been used to simulate thecompensatory articulation observed duringbilabial closure gestures (Saltzman, 1986; Kelso etal., 1986a, 1986b). Using point-attractor (e.g.,damped mass- spring) dynamics for the control oflip aperture, when the simulated jaw is "frozen" inplace during the closing gesture, at least the mainqualitative features of the data are captured bythe model, in that: 1) the target bilabial closure isreached (although with different final articulatorconfigurations) for both perturbed andunperturbed "trials," and 2) compensation isimmediate in the upper and lowe'r lips to the jaw

perturbation, i.e., the system does not requirereparameterization in order to compensate.Significantly, in task-dynamic modeling theprocesses governing intra-gestural motions of agiven set of articulators (e.g., the bilabialarticulatory set defined by the jaw and lips) areexactly the same during simulations of bothunperturbed and mechanically perturbed activegestures. In all cases, the articulatory movementpatterns emerge as implicit consequences of thegesture-specific dynamical parameters (i.e., tract-variable parameters and articulator weights; seeAppendices 1 and 2), and the ongoing posturalstate (perturbed or not) of the articulators.Erllicit trajectory planning and/or replanningprocedures are not required.

Gestural activation coordinatesTask dynamics identifies several different time

spans that are important for conceptualizing thedynamics of speech production. For example, thesettling time of an unperturbed discrete bilabialgesture is the time required for the system tomove from an initial position with zero velocity towithin a criterion percentage (e.g., 2%) of thedistance between initial and target positions. Agesture's settling time is determined jointly by theinertia, stiffness, and damping parametersintrinsic to the associated tract-variable pointattractor. Thus, gestural duration or settling timeis implicit in the dynamics of the interarticulatorlevel (Figure 2, bottom), and is not representedexplicitly. There is, however, another time spanthat is defined by the temporal interval duringwhich a gestural unit actively shapes movementsof the articulators. In pr-vious sections, theconcept of gestural activity was used in anintuitive manner only. We now define it in a morespecific fashion.Intervals of active gestural control are specified atthe intergestural level (Figure 2, top) with respectto the system's activation variables. The set ofactivation variables defines a third coordinatesystem in the present model, in addition to thosedefined by the tract variables and modelarticulators (see Figures 2 & 5). Each distincttract-variable gesture is associated with its ownactivation variable, aik, where the subscript-idenotes numerically the associated tract variable(i = 1, ..., m), and the subscript-k denotessymbolically the particular gesture's linguisticaffiliation (k = IpI, Al, etc.). The value of a,k. can beinterpreted as the strength with which theassociated tract-variable dynamical system"attempts" to shape vocal tract movements at anygiven point in time.

51

Page 53: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

44 Saltzman and Munhall

Figure 5. Example of the "anatomical' connectivity pattern defined among the model's three coordinate systems. BLand TD denote tract-variables associated with bilabial and tongue-dorsum constrictions, respectively.

In current simulations the temporal patterningof gestural activity is accomplished with referenceto a gestural score (Figure 6) that represents theactivation of gestural primitives over time acrossparallel tract-variable output channels. Currently,these activation patterns are not derived from anunderlying imi -cit dynamics. Rather, thesepatterns are spc ;ad explicitly 'by hands, or arederived according to a rule-based synthesisprogram called GEST that accepts phonetic stringinputs and generates gestural score outputs(Brownian et al., 1986). In the gestural score fcr a

TONGUE BOOYCONTRICTION

DEGREE

LPAPERTURE

GOTTA'APERTURE

given utterance, the corresponding set ofactivation functions is specified as an explicitmatrix function of time, A(t). For purposes ofsimplicity, the activation interval of a givengesture-ik is specified according to the duration ofa step-rectangular pulse in zit, normalized to unitheight (aa a (0, 1)). In future developments of thetask-dynamic model (see the Serial Dynamicssection later in the article), we plan to generalizethe shapes of the activation waves and to allowactivations to vary continuously over the interval(05a 51).6

/pAb/

ON dubs

4oi

Time (ma)

IMO was aragision

WM MOM demure

Mel OW spars

Figure 6. Gestural score used to synthesize the sequence /pab/. Filled boxes denote intervals of gestural activation. Boxheights are uniformly either 0 (no activatIon) or 1 (full activation). The waveform lines denote tract-variabletrajectories generated during the simulation.

Page 54: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

A Dyne:Riad Approach to Gestural Patterning in Spank Production 45

COPRODUCTIONTemporally discrete or isolated gestures are at

best rare exceptions to the rule of temporalinterleaving and overlap (coproduction) amonggestures associated with nearby segments (e.g.,Bell-Berti & Harris, 1981; Fowler, 1977, 1980;Harris, 1984; Keating, 1985; Kent & Minifie, 1977;Oilman, 1966, 1967; Perkell, 1969; Sussman et al.,1973; for an historical review, see Hardcastle,1981 ). This interleaving is the source of theubiquitous phenomenon of coarticulation inspeech production. Coarticulation refers to thefact that at any given point during an utterance,the influences of gestures associated with severaladjacent or near - adjacent segments can generallybe discerned in acoustic or articulator)measurements. Coarticulatory effects can occur,for example, when lip protrusion for a followingrounded vowel begins during the precedingphonologically unrounded consonant, therebycoloring the acoustic correlates of the consonantwith those of the following vowel. Similarly, in avowel-/p/-vowel sequence the formation of thebilabial closure for /p/ (using the jaw and lips)appears to be inflvsnced by temporallyoverlapping demands associated with thefollowing, vowel (using the jaw and tongue) byvirtue _ the shared jaw component? In thecontext of the present model (see also Coker, 1976;Henke, 1966), these overlapping demands can berepresented as overlapping activation patterns ina corresponding set of gestural scores. Thespecification of gestural scores (either by hand orby synthesis rules) thereby allows rigorousexperimental control over the temporal onsets andoffsets of the activations of simulated gestures,and provides a powerful computational frameworkand research tool for exploring and testinghypotheses derived from current work inexperimental phonology/phonetics. In particular,these methods have facilitated the exploration ofcoarticulatory phenomena that have been ascribedto the effects of partial overlap or coproduction ofspeech gestures. We now describe in detail howgestural activation is incorporated into ongoingcontrol processes in the model, and the effects ofcoproduction in shaping articulatory movementpatterns.

Active gestural control: Tuning and gatingHow might a gesture gain control of the vocal

tract? In the present model, when a givengesture's activation is . maximal (arbitrarilydefined as 1.0), the gesture exerts maximal

influence on all the articulatory componentsassociated with the gesture's tract-variable set.During each such activation interval, the evolvingconfiguration of the model articulators resultsfrom the gesturally- and posturally-specific waythat driving influences generated in the tract-variable space (Equation (A1], Appendix 1) aredistributed across the associated sets ofarticulatory components (Equations (A3] and [A4],Appendix 2) during the course of the movement.Conversely, when the gesture's activation isminimal (arbitrarily defined as 0.0), none of thearticulators are subject to active control influencesfrom that gesture. What, then, happens to themodel articulators when there is no active control?We begin by considering the former question ofactive control, and treat the latter issue ofnonactive control below in the section NonactiveGestural Control.

The driving influences associated with a givengesture's activation 'wave" (see Figure 6) areinserted into the interarticulator dynamicalsystem in two ways in our current simulations.The first way serves to define or tune the currentset of dynamic parameter values in 14- del (i.e.,K, B, so, and W in Equations [A. _ad [A4],Appendix 2; see also Saltzman & Kelso, 1983,1987 for a related discussion of parameter tuningit the context of skilled limb actions). The secondway serves to implement or gate the currentpattern of tract-variable driving influences intothe appropriate set of articulatory components.The current use of tuning and gating is similar toBullock and Grossberg's (1988a, 1988b; see alsoCohen, Grossberg & Stork, 1988; Grossberg, 1978)use of target specification and *GO signals,*respectively, ii. their model of sensorimotorcontrol.

The details of tuning and gating processesdepend on the ongoing patterns of overlap thatexist among the gestures in a given utterance. Thegestural score in Figure 6 captures in adescriptive sense both the temporal overlap ofspeech gestures as well as a related spatial type ofoverlap. As suggested in the figure, coproductionoccurs whenever the activations of two or moregestures overlap partially (or wholly) in timewithin and/or across tract-variables. Spatialoverlap occurs whenever two or more coproducedgestures share some or all of their articulatorycomponents. In these cases, the influences of thespatially and temporally overlapping gestures aresaid to be blended. For example, in a vowel-consonant-vowel (VCV) sequence, if one assumesthat the activation intervals of the vowel and

53

Page 55: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Saltzman and Munhail

consonant gestures overlap temporally (e.g.,Ohman, 1987; Sussman et al., 1973), then one candefine a continuum of supralaryngeal overlap thatis a function of the gestural identity of the medialconsonant. In such sequences, the flanking vowelgestures are defined, by hypothesis, along thetongue-dorsum tract variables and the associatedarticulatory set of jaw and tongue body. If theconsonant is /h/, then there is no supralaryngealoverlap. If the consonant is /b/, then its gesture isdefined along the bilabial tract variables and theassociated lips-jaw articulator set. Spatial overlapoccurs in this cue at the shared jaw. 1 theconsonant is the alveolar /d/, its gesture is definedalong the tongue-tip tract variables and theassociated artic later set of tongue tip, tonguebody, and jaw. Spatial overlap occurs then at theshared jaw and tongue body. Note that in both thebilabial and alveolar instances the spatial overlapis not total, and there is at least one articulatorfree to vary, adaptively and flexibly, in waysspecific to its associated consonant. Thus, Ohman(1967) showed, for a medial alveolar in a VCVsequence, that both the location and degree oftongue-tip constriction were unaffected by theidentity of the flanking vowels, although thetongue-dorsum's position was altered in a vowelspecific manner. Finally, if the medial consonantin a VCV sequence is the velar /g/, the consonantgesture is defined along exactly the same set oftract variables and articulators as the flankingvowels. In this case, there is total spatial overlap,and the system shows a loss of behavioralflexibility. -..",t is, there is now contextualvariation evident even in the attainment of theconsonant's tongue-dorsum constriction target;Ohman (1967), for example, showed that in suchcases the velar's place of constriction was alteredby the flanking vowels, although the constrictiondegree was unaffected.

Blending due to spatial and temporal overlapoccurs in the model as a function of the manner inwhich the current gestural activation matrix, A, isincorporated into the interarticulator dynamicalsystem. Thus, blending is implemented withrespect to both the gestural parameter set(tuning) and the transformation from tract-variable to articulator coordinates (gating)represented in Equations (A3) and (A4). In thefollowing paragraphs, we describe first thecomputational implementation of these activationand blending processes, and then describe theresults of several simulations that demonstratetheir utility.

Parameter tuning. Each distinct simulatedgesture is linked to a particular subset of tract-variable and articulator coordinates, and hasassociated with it a set of time-invariantparameters that are likewise linked to thesecoordinate systems. For example, a tongue-dorsum gesture's stiffness, damping, and targetparameters are associated with the tract variablesof tongue-dorsum constriction location and degree;its articulator weighting parameters areassociated with the jaw angle, tongue-body radial,and tongue-body angular degrees of freedom.Values for these parameters are estimated fromkinematic speech data obtained by optoelectronicor X-ray measurements (e.g., Kelso et al., 1985;Smith, Browman, & McGowan, 1988; Vatikiotis-Bateson, 1988). The parameter set for a givengesture is represented as:

4, bit, 4k,w1f,

where the subscripts denote numerically eithertract variables (i = 1, ..., m) or articulators(i = I, ..., n), or denote symbolically the particulargesture's linguistic affiliation (k = /p/, fil, etc.).These parameter sets are incorporated into theinterarticulator dynamical system (see Equatior a[A3] and [A4], Appendix 2) as functions of thecurrent gestural activation matrix, A, according toexplicit algebraic blending rules. These rulesdefine or tune the current values for thecorresponding components of the vector xo andmatrices K, B, and W in Equations (A3) and (A4)as follows:

bas = I (Pribli)Ilia;

kii = I (PTikk 40keZ i

zoi= I (prig:,) ;andkeZi

wi E..= Pwietkj + gNiled)) Li

54

(la)

(lb)

(1c)

(1d)

Page 56: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

A Dynamical Approach to Gestural Patterning in Speech Production 47

where pro and pwi,v are variables denoting thepost-blending strengths of gestures whose'ctivations influence the ongoing values of tract-variable and articulatory-weighting parameters,respectively, in Equations Al - A4 (Appendices 1and 2); Zi is the set of gestures associated with theith tract-variable; ei is the set of tract-variablesassociated v' th the jth model articulator(see Figure 4); and g pew mg 1.0 - min

[I' (E aitie.i kfZi . The subscript N denotes

the fact that, for parsimony's sake, gNii are thesame elements used to gate in theft articulatorycomponent of the neutru: attractor (see theNonactive Gestural Control Ration to follow). InEquations (la-lc), the parameters of the ith tract-variable assume default N alums of zero at timeswhen there are no active gestures that involvethis tract-variable. Similarly, i Equation (1d), thearticulatory ---,sighting parameter of the farticulator as...Imes a default value of 1.0, due tothe contribution of the gmi term, at times whenthere are no active gestures involving thisarticuit. ar.

The pTik and pwiv terms in Equation 1 aregiven by the steady-state solutions of a set offeedforward, competitive-interaction-networkdynamical Equations (see Appendix 3 for details).These solutions are expressed as follows:

Pram ait / 11 + 141 [ aitait] ;

leZito k

ie ey[lcZito k

(Pw ; k r a i k / 1 + fill

(2a)

(2b)

where au = competitive interaction (lateralinhibition) coefficient from gesture-il to gesture-ik,for loaf; and Pik = a 'gatekeeper' coefficient thatmodulates the incoming lateral inhibitioninfluences impinging on gestu, , fie from gesture-il, for 1A4; For parsimony, Pik is constrained toequal 1.0/aa for aik * 0.0. If a = 0.0,13ik is set +oequal 0.0 6y convention. Implementing theblended parameters defined by Equations (1a-1c)into the dynamical system defined by Equations(A3) and (A4) creates an attractor layout or field of

driving influences in tract-variable space that isspecific to the set of currently active gestures. Theblended param-eqrs defined by Equation (1d)create a corresponding pattern of relative'receptivities" to these driving influences amongthe associated synergistic articulators in thecoordinative structure.

Using the blending rules provided by Equation(1) and (2), different forms of blending can bespecified, for example, among a set of temporallyoverlapping gestures defined within the samezact variables. The form of blending depends onthe relative sizes of the context-independent(time-invariant) a and f3 parameters associatedwith each gesture. For aike (0, 1), threepossibilities re; averaging, suppressing, andadding. For the set of currently active gesturesalong the Oh tract variable, if all a's are equal andgreater than zero (all 13% are then equal by

constraint), then the / PakeZ i

equal 1.0 and th 3 tract-variableby simple averaging. If the a's

greater than zero, then the

is normalized to

psi ameters blendare unequal and

v--.L prit is alsoka i

normalized to equal 1.0 and the parameters blendby a weighted averaging. For example, ifgesture-ik's ark :. 10.0 and gesture-il's au = 0.1,then gesture-ik's parameter values dominate or"suppress" gesture-il's parameter values in theblending process when both gestures are co-active.Finally, if all a's = 0.0, then all fi's = 0.0 byconvention, and the parameters in Equation (1)blend by simple addition. Currently, all gesturalparameters in Equation (1) are subject to thesame form of competitive ble.cling. 1. would bepossible at some point, however, to implementdifferent blending forms for the differentparameters, e.g., adding for targets and averagingfor stiffnesses, as suggested by recent data on lipprotrusion (Boyce, 1988) and laryngeal abduction(Munhall 8g Lofqvist, 1987).

Trantiformation gating. The tract-variabledriving influences shaped by Equations (1) and (2)i emain implicit and *disconnected" from thereceptive model articulators until these influencesare gated explietly into the articulatory degrees offreedom. This gating ours with respect to theweighted Jacobian nseudoinverse (i.e., thetransformation that relates tract-variable motions

LIIIIMIlii1=61111!.1

Page 57: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

48 Saltzman and Munhall

to articulatory motions) .-..id its associatedorthogonal projection operator (see Appendix 2).Specifically, P and In are replaced in Equation(A4) by gated forms, J1* o and Gp, respectively. J*0can be expressed as follows:

J*G= W-1J GT( C + [lin GA] )41 (3a)

where JG = GAJ, and GA is a diagonal m x mgating matrix for the active tract-variable

gestures. Each gAil = min 1 ak , where the(asZ i

summation is defined as in Equatic is (1) and (2).Each gm multiplies .the ith row of the Jacobian.This row relates motions of the articulaLors tomotions defined along the ith tre-t variable (seeEquation [A2]). When gmi = 1 ( or ( ), the Oh tractvariable is gated into (out of) J*0 and contributes(does not contribute) to ix the vector of activearticulatory driving influences (see Equations [A3]and [A4]); C= Jo WAJG,T. C embodies thekinematic interrelationships that exist among thecurrently active set of tract variables. Specifically,C is defined by the set of weighted, pairwise innerproducts of the gated Jacobian rows. A diagonalelement of C, cu, is the weighted inner product(sum of squares) of the ith gated Jacobian rowwith itself; an off-diagonal element, cia(hie i), isthe weighted inner product (sum of products) ofthe hth and ith gated Jacobian rows. A pair ofgated Jicobian rows h is a (generally) nonzeroweighted inner produce. when the correspondingt. : et variables are active and share some or allarticulators in common; the weighted innerproduct of two gated Jacobian rows equals zerowhen the corresponding tract variables are activeand share no articulators; the inner product also( 4uals zero when one or both rows correspond to anonactive tract variable; and Ir = a m x m identitymatrix.

The gated orthogor., .4 projection operator isexpressed as follows:

[Gp PGA (3b)

where Gp = a diagonal n x n gating matrix.

Each element gilj= min 1, y1 ( ,,,,, ,)]led); keZ i

where the summations are defined as inEquations (1) and (2).

For example, if there are no active gestures thenGA = 0, Gp = 0, and ( C + [ Im GA ] ) = ImConsequently, both J* and the gated orthogonalprojection cperator equal zero, and eA = 0according to Equation (ft _,. lf active gesturesoccur simultaneously ir. all tract variables, thenGA = Ink, Gp = In, and PG = P. That is, both thegated pseudoinverse and orthogonal projectionoperator are "full blown* when all tract variablesare active, and MA is influenced by the attractorlayouts and corresponding driving influencesdefined over all the tract variables. If only a few ofthe tract variables are active, these terms are notfull blown and iiA is only subject to drivinginfluences associated with the attractor layout inthe subspace of active tract variables.

Nonactive gestural control: The neutralattractor

We return now to the question of what happensto the model articulators when there is no activecontrol in the model. In such cases, articulatorntos7ements are shaped by a 'default* neutralail reactor. The neutral attractor is a point attractorin model articulator space, whose targetconfiguration correspond.' to schwa /a / in currentmodeling. It is possible, however, that this neutraltarget ;nay Ls language-specific. The articulatorydegrees of freedom in the neutral attractor areuncoupled dynamically, i.e., point attractordynamics are defined independently for rillarticulators. At any given point in time, I,..oneutral attractor exerts a set of driving influenceson the articulators, iN, that can be expressed asfollows:

ON = GN ( BNO KN[ll .NO] ), (4)

where a 'A the neutral target configuration; BNand KN are n x n diagonal damping and stiffnessmatrices, respectively. Because their parametersnever change, parameter tuning or blending is notdefined for the neutral attractor. The components,kAl f of KN ire typically defined to be equal,although they may be defined asymmetrically toreflect hypothesized differences in thebiomechanical time constants of the articulators(e.g., the jaw is more sluggish [has a larger timeconstant] than the tongue tip). The components,bNJj, of BN are defined at present relative to thecorresponding KN components to provide critical

Page 58: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

A Dynamical iyirrea, It to Gestural Patterning in Speech Production

damping for each articulator's neutral pointattractor. For parsimony's sake, 13N is also used todefine the orthogonal projection vector inEquation (A4); and GN is an xn diagonal gatingmatrix for the neutral attractor. Each

]element gNiug 1.0 min 1, I ( I aa) where

ioDi keZr 'the summations are defined as in Equations (1-3).Note that GN = In Gp, where In is the n x nidentity matrix and Gp is defined in Equation (3b).

The total set of driving influences (ii,r ) on thearticulators at any given point in time is the sumof an active component OA; see Equations [A3],[A4], and [3]) and a neutral component (MN; seeEquation [4] ), and is defined as follows:

(5)

For example, consider a time when only a tongue-dorsum gesture is active. Then gAii (for LH) =gN33 (for ULV) = gN44 ifor LLV) = 1.0, and -Nn(for JA) = gmi5 (for TBR) = glue (for TBA) = 0.Active coatrol will exist for the gesturally involvedjaw and tongue (JA, TBR, and TBA), but thenoninvolved lips (LH, ULV, and LLV) will 'relax"independently from their current positions awardtheir neutral positions according to the specitialtime constants. If only a bilabial gesture is active,then the complementary situation holds, withgN11= gN33 = gN44 = 0., aag22= gN56= gN66=1.0. The jaw and lips will be actively controlled,and the tongue will relax toward its neutral

A

49

configuration. When both bilabial and tongue-dorsum gestures are active simultaneously, allg components equal zero, = 0, and theneutral attractor has no influence on thearticulatory movement patterns. When there is noactive control, all gmi components equal one, andall articulators relax toward their neutral targets.

Simulation examplesWe now describe results from several

simulations that demonstrate how active andneutral control influences are implemented in themodel, focusing on instances of gesturalcoproduction.

Parameter tuning. The form of parameterblending for speech production has beenhypothesized to be tract-variable-specific(Saltzman et al., 1987). As already discussed inthe Active Gestural Control section, Ohman (1967)showed that for VCV sequences, when the medialconsonant was a velar (/g/ or /k/), the surroundingvowels appeared to shift the velar's place ofconstriction but not its degree. These results havebeen simulated (qualitatively, at least) bysuperimposing temporally the activation intervalsfor the medial velar consonant and the flankingvowels. During the resultant period of consonant-vowel coproduction, an averaging blend wasimplemented for tuning the tract-variableparameters of tongue-dorsum-conatriction-location, and a suppressing blend (velarsuppresses vowel) was implemented for tongue-dorsum-constriction-degree (see Figure 7;Saltzman et al., 1987; cf., Coker, 1976, for analternative method of generating similar results).

B

lel

Figure 7. Simulated vocal tract shapes. A. First contact of tonne-dorsum and upper tract wall dueng symmetric vowel-velar-vowel sequences. B. Corresponding steady-sate vowel productions.r

Page 59: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

50 SWARM and Muuhall

This blending scheme for constriction degree isconsistr it with the assumption in currentmodeling that the amount of suppression duringblending is related to differences in the sonority(Jesperson, 1914) or openness of the vocal tractassociated with each of the blended gestures.Gestural sonority is reflected in the constrictiondegree target parameters of each gesture. Fortongue-dorsum gestures, vowels have largepositive-valued targets for constriction degree thatreflect their open tract shapes (high sonority), andstops have small negative target values thatreflect contact-plus-compressico against the uppertract wall (low sonority).

Transformation gating. Simulationsdescribed in this article of blending for gesturesdefined along different tract variables have beenrestricted to periods of temporal overlap betweenpairs of bilabial and tongue-dorsum gestures.Under these circumstances, articulatory trajec-tories have been generated for sequences

ACOUSTICS

TONGUEBLADE

(vertical)

MID-TONGUE(vertical)

REARTONGUE(vertical)

LOWERUP

(vertical)

JAW(vertical)

involving consonantal gestures superposed ontoongoing vocalic gestures that match (qualitatively,at least) the trajectories observed in X-ray data. Inparticular, Tiede and Browman (1988) analyzedX-ray data that included the vertical motions ofpellets placed on the lower lip, lower incisor (i.e.,jaw), and 'mid-tongue" surface during /pV,pV2p/sequences. The mid-tongut pellet heightcorresponds, toughly, to tongue-dorsum height inthe current model. Tiede and Brovnnan found thatthe mid-tongue pellet moved with a relativelysmooth trajectory from its position at the onset ofthe first vowel to its position near the offset of thesecond vowel. Specifically, when V was themedium height vowel /e/ and V2 was the1low vowellal, the mid-tongue pellet showed a smoothlowering trajectory over this gestural time span(see Figure 8). During this same interval, the jawand lower lip pellets moved ..nrough comparablysmooth gestural sequences of lowering for V1,raising fo. the medial /p/, and lowering for V2.

a I i

.

,e1 .."....0^°. ,................

I

. .. .

1 1

'Tht.............i!----.,....11

194 214 234

TIME (frames)

254 274

up

down

!up

down

uPA

down

uP

down

up

down

Figure 8. Acoustic waveiorm and vertical components of articulatory X-ray pellet data during the utterance /map/.(From Tiede & Brownian, 1988; used with authors' permission).

Page 60: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

A Dynamical Approach to Gestural Patterning in Speech Production 51

Figure 9a shows a simulation with the rrentmodel of a similar sequence /obiebie. The aainpant is that the vowel-to-vowel trajectory fortongue-dorsum-constriction-degree is smooth,going from the initial schwa to the more open InThis tongue-dorsum pattern occurs simulta-neously with the comparably smooth closing-opening gestural sequences for jaw height and lipaperture.

Two earlier versions of the present modelgenvated nonacceptable trajectories for this samesequence that are instructive concerning themodel's functioning. In one version (the 'modular'model), each constriction type operatedindependently of the other during periods ofcoproduction. For example, during periods ofbilabial and tongue-dorsum overlap, drivingimmense: were generated along the tract-variables associated 4ith each constriction. Theseinfluences were then transformed into articulatorydriving influences by separate, constriction-specific Jacobian pseudoinverses (e.g., seeEquations (A3] and (A4]). Thri bilabialpseudoinverse involved only the Jacobian rows(see Equation [A2] and Figure 4) for .ip apertureand protrusion, and the tongue-dorsumpseudoinverse involved only the Jacobian rows for

ACOUSTICS

TONGUE DORSUMCONSTRICTION

DEGREE(TDCD)

UP APERTURE(LA)

JAW ANGLE(JA)

II

InPVI

ilividlA

tongue-dorsum constriction location and degree.The articulatory driving influences associatedwith each constriction were simply averaged atthe articulatory level for the shared jaw.The results are shown in Figure 9b, where it isevident that the tongue-dorsum does not displaythe relatively smooth vowel-to-vowel trajectoryseen in the X-ray data and with the currentmodel. Rather, the trajectory appears to beperturbed in a complex manner by thesimultaneous jaw and lip aperture motions. It ishypothesized that these perturbations are due tothe fact that the modular model did notincorporate, by definition, the off-diagonalelements of the C-matrix used currently in thegated pseudoinverse (Equation [3]). Recall thatthese elements reflected the kinematicrelationships that exist among different,concurrently active tract-variables by virtue ofshared articulators. In the modular model, theseterms were absent because the constriction-specific pseudoinverses were defined explicitly tobe independent of each other. Thus, if the currentmodel is a reasonable one, it tells us thatknowledge of inter-tract-variable kinematicrelationships must be embodied in the control andcoordinative processes for speech production.

B

open

doson

open

dosedup

down

Figure 9. Simulations of the sequence iambs/. A. Current model. B. Older "modular" version. C Older "flat jaw"version.

r- (-1

Page 61: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

52 Saltzman and Mun.411

A different type of failure by a second earlierversion of the model provides additionalconstraints on the form that must be taken bysuch inter-tract-variable knowledge. En route todeveloping the current model, a mistake wasmade that generated a perfectly flat jaw trajectory(the "flat jaw' model) fcr the same sequence(/ebzbir/; see Figura 9c). Interestingly, however,the tongue-dorsum trajectory was virtuallyidentical to that generated with the currentmodel. The reason for this anomalous jawbehavior was that the gated pseudoinverse(Equation (3)) had been forced accidentally to be"full blown" regardless of the ongoing state ofgestural activation. This meant that all tractvariables were gated on in this transformation,even mhen the associated gestures were notactivated. The specification of the attractor layoutat the tract-variable level, however, worked as itdoes in the current model. Active gestures "create"point attractors in the control landscape for theassociated tract variables. In this landscape, thecurrently active target can be considered to lie atthe bottom of a valley whose walls are slightlysticky. The resultant tract-variable motion is toslide stably down the valley wall from its currentposition toward the target, due to the nonzerodriving influences associated with the system'sattraction to the target position. Nonactivegestures, on the other hand, "create" only flattract-variable control landscapes, in which noposition is preferred over any other and the valueof the tract-variable driving influences equalszero. Recall from the Gestural Primitives section(Figures 3 and 4) that the model inch les a lower-tooth-height tract variable that maps one-to-oneonto jaw angle. For the sequence /sixebri, thistract variable is never active and, consequently,the corresponding component of the tract-variabledriving influence vector is constantly equal tozero. Whet the gated pseudoinverse is full blown,this transformation embodies the kinematicrelationships among the bilabial, tongue-dorsum,and lower - toot - height tract variables that existby virtue of the shared jaw. This means that thetransformation treats the zero driving componentfor lower-tooth-height as a value that should bepassed on to the articulators, in conjunction withthe driving influences from the bilabial andtongue-dorsum constrictions. As a result, the jawreceives zero active driving, and because the jawstarts off at its neutral position for the initialschwa, it also recti- ..is zero driving from theneutral attractor (Equation (4)) throughout thesequence. The result is the observed flat trajectory

for the jaw. Thus, if the current model is asensible one, this nonacceptable "flat jaw"simulation tells us that the kinematicinterrelationships embodied in the system'spseudoinverse at any given point in time must begated functions of the currently active gesture set.

SERIAL DYNAMICSThe task-dynamic model defines, in effect, a

selective pattern of coupling among thearticulators that is specific to the set of currentlyactive gestures. This coupling pattern is shapedaccording to three factors: a) the cui rent atate ofthe gestural activation matrix; b) the tract-variable parameter sets and articulator weightsassociated with the currently active gestures; andc) the geometry pf the nonlinear kinematicmapping between articulatory and tract-variablecoordinates (represented by J and JT in Equations(A2) and (A3)) for all associated active gestures.The model provides an intrinsically dynamicalaccount of multiarticulator coordination withinthe activation intervals of single (perturbed andunperturbed) gestures. It also holds promise forunderstanding the blending dynamics ofcoproduced gestures that share articulators incommon. However, task-dynamics does notcurrently provide an intrinsically dynamic accountof the intergestural timing patterns comprisingeven a simple speech sequence (see Figures 1 and6). At the level of phonologically defined segments,the sequence might be a repetitive alternationbetween a given vowel and consonant, e.g.,/bababa.../. At a more fine-grained level of'Ascription, the sequence might be a"constellation" (Brnwman & Goldstein, 1986, inpress) of appropriately phased gestures, e.g., thebilabial closing-opening and the laryngealor fining- closing for word-initial /p/ in English. Asdiscussed earlier, current simulations rely onexplicit gestural scores to provide the layout ofactivation intervals over time and tract variablesfor such utterances.

The lack of an appropriate serial dynamics is amajor shortcoming in our speech modeling to date.This shortcoming is linked to the fact that themost-studied and best-understood dynamicalsystems in the nonlinear dynamics literature arethose whose behaviors are governed by pointattractors, periodic attractors (limit cycles), andstrange attractors. (Strange attractors underliethe behaviors of chaotic dynamical systems, inwhich seemingly random movement patterns havedeterministic origins; e.g., Ruelle, 1980). Fornonrepetitive and nonrandom speech sequences,

60

Page 62: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

to Gestural Patterning in Speech Production 53

such attractors appear clearly inadequate.However, investigations in the computationalmodeling of connectionist (parallel distributedprocessing, neuromorphic, neural net) dynamicalsystems have focused on the problem of sequencecontrol and the understanding of serial dynamics(e.g., Grossberg, 1986; Jordan, 1986, in press;Kleinfeld & Sompolinsky, 1988; Lapedes &Farber, cited in Lapedes & Farber, 1986;Pearlmutter, 1988; Rumelhart, Hinton, &Williams, 1986; Stornetta, Hogg, & Huberman,1988; Tank & Hopfield, 1987). Such dynamicsappear well-suited to the task of sequencing ororchestrating the transitions in activation amonggestural primitives in a dynamical model ofspeech production.

Intergestural t _zing: A connectionistapproach

Explaining how a movement sequence isgenerated in a connectionist computationalnetwork becomes primarily a matter of explainingthe patterning of activity over time among thenetwork's processing elements or nodes. Thispatterning occurs through cooperative andcompetitive interartions among the nodesthemselves. Each node can store only a smallamount of information (typically only a fewmarker bits or a single scalar activity-level) and iscapab:e of only a few simple arithmetic or logicalactions. Consequently, the interactions areconducted, not through individual programs orsymbol strings, but through very simplemessagessignals limited to variations instrength. Such networks, in which thetransmission of symbol strings between yodels isminimal or nonc.datent, depend for their successon the availability and attunement of the rightconnections among the nodes (e.g., Ballard, 1986;Fahlman & Hinton, 1987; Feldr an & Ballard,1982; Grossberg, 1982; rumelhart, ilinton, &McClelland, 1986). The knowledge ':onstrainingthe performance of a serial activity, includingcoarticulatory patterning, is embodied in theseconnections rather than stored in specializedmemory banks. That is, the structure anddynamics of the network govern the movement asit evolves, and knowledge of the movement's timecourse never appears in an explicit, declarativeform.

In connectionist models, the plan for a sequenceis static and timeless, and is identified with a setof input units. Output units in the network areassumed to represent the control elements of themovement components and to affect these

elements in direct proportion to the level ofoutput-unit activation. One means of producingtemporal ordering is to (a) establish an activation-level gradient through lateral inhibition amongthe output units so that those referring to earlieraspects of the sequence are more active than thosereferring to later aspects; and (b) inhibit outputunits once a threshold value of activation isachieved (e.g., Grossberg, 1978). Suchconnectionist systems, however, have difficultyproducing Bumf- ;es in which movementcomponents are repeated (e.g., Rumelhart &Norman, 1982). In fact, a general a ikwardness indealing with the sequential control of networkactivity has been acknowledged as a majorshortcoming of most current connectionist models(e.g., Hopfield & Tank, 1986). Some promisingdevelopments have been reported that addresssuch criticisms (e.g., Grossberg 1986; Jordan,1986, in press; Kleinfeld & Sompolinsky, 1988;Lapedes & Farber, cited in Lapedes & Farber,1986; Pearlmutter, 1988; Rumelhart, Hinton, &Willams, 1986; Stornetta, Hogg, & Huberman,1988; Tank & Hopfield, 1987). We now describe indetail one such development (Jordan, 1986, inpress).

Serial dynamics: A representative modelJordan's (1986, in press) connectionist model of

serial order can be used to define a time-invariantdynamical system with an intrinsic time scalethat spans the performance of a given outputsequence. There are three levels in his model (seeFigure 10). At the lowest level are output units.Even if a particular output unit is activatedrepeatedly in an intended sequence, it isrepresented by only one unit. Thus, the node'adopts a type rather than token representationscheme for sequence elements. In the context ofthe present article, a separate output unit wouldexist for each distinct gesture in a sequence. Thetuning and gating consequence! of gesturalactivation described earlier (see the ActiveGestural Control section) are consistent withJordan's suggestion that "the output of thenetwork is best thought of as influencingarticulator trajectories indirectly, by settingparameters or providing boundary conditions forlower level processes which have their owninherent dynamics" (Jordan, 1986, p. 23). Forexample, in the repetitive sequence /bababa..J,there would be (as a first approximation) only twooutput units, even though each unit potentiallycould be activated an indefinite number of timesas the sequence continues. In this example, the

6I

Page 63: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Si Saltzman and Munhall

output units define the activation coordinates forthe consonantal bilabial gesture and the vucalictongue-dorsum gesture, respectively. The valuesof the output units are the activation values of theassociated gestures, and can vary continuouslyacross a range normalized from zero (theassociated gestural unit is inactive) to one (theassociated ger tit is maximally active).

Figure 10. Basic network architecture for Jordan's (1986,in press) connectionist model of serial order (not allconnections are shown). The plan units and theirconnections (indicated in light grey) are not used in ourproposed hybrid model car the serial dynamics of speechproduction (see text and footnote (81 for details).

At the highest level of Jordan's model ere thestate units that, roughly speaking, define amongthemselves a dynamical flow with an intrinsictime scale specific to the intended sequence. Thesestate-unit dynamics are defined by an equation ofmotion (the next-state function) that isimplemented in the model by weighted recurrentconnections among the state units themselves,and from the output units to the state units.Finally, at an intermediate level of the model area set of hidden units. These units are connected toboth the state units and the output units by tworespective layers of weighted paths, therebydefnag a nonlinear mappitqf or output functionfrom state units to output units. The currentvector of output activations is a function of thepreceding state, which is itself a function of theprevious state and previous output, and so on.Thus, the patt *ring over time of onsets andoffsets for the output units does not arise as aconsequence of direct connections among theseunits. Rather, such relative timing is an emergent

property of the dynamics of the network as awhole. Temporal ordering among the outputelements of a gestural sequence is an implicitconsequence of the network architecture (i.e., theinput-output functions of the system elements,and the pattern of connections among theseelements) and the sequence-specific set ofconstant values for the weights associated witheach connection paths

The network can learn" a different set of weightvalues for each intended utterance in Jordan's(1986; in press) model, using a 'teaching"procedure that incorporates the generalized deltarule (back propagation method) of Rumelhart,Hinton, & Williams (1986). According to this rule,error signals generated at the output units(defined by the difference between the currentoutput vector and a 'teaching" vector of desiredactivation values) are projected bac! into thenetw,ek to allow the hidden units to change theirweights. The weights on each pathway arechanged in proportion to the size of the error beingback-propagated along these pathways, and errorsignals for each hidden unit are computed byadding the error signals arriving at these units.Rumelhart, Hinton, & Williams (1986) showedthat this learning algorithm implementsessentially a gradient search in weight space forthe set of weights that allows the network toperform with a minimum sum of squared outputerrors"

Jordan (1986; in press) reported simulationresults in which activation values of the outputunits represented values of abstract phoneticfeatures such as degree of voicing, nasality, nr liprounding. The serial network was trained toproduce sequences of "phonemes*, in which eachphoneme was defined as a particular bundle ofcontext-independent target values for thefeatures. These features were not used to generatearticulatory movement patterns, however. Aftertraining, the network produced continuoustrajectories over time for the feature) values.These trajectories displayed several impressiveproperties. First, the desired values were attainedat the required positions in a given sequence.Second, the featural trajectories showeda.,ticipatory and carryover coarticulatory effectsfor 11 feature that were contextually dependenton the composition of the sequence as a whole.This was due to the generalizing capacity of thenetwork, according to which similar networkstates tend to produce similar outputs, and thefact that the network states during production of agiven phoneme are similar to the states in which

Page 64: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

A Dynamical Approach to Gestural Patterning in Speech Production 55

nearby phonemes are learned. Finally, thecoarticulatory temporal "spreading" of a givenfeatural target value was not unlimited. Ratht , itwas restricted due to the dropoff in statesimilarity between a given phoneme and itssurrounding context.

Toward a hybrid dynamical modelJordan's (1986; in press) serial network has

produced encouraging results for understandingthe dynamics of intergestural timing in speechproduction. However, as already discussed, hisspeech simulations were defined with respect to astandard list of phonetic features, and were notrelated explicitly t actual articulatory movementpatterns. We plan to incorporate such a serialnetwork into our speech modeling as a means ofpatterning the gestural activation intervals in thetask-dynamic model summarized in Equation (5).The resultant hybrid dynamical system (Figure 2)for articulatory control and Coordination shouldprovide a viable basis for further theoreticaldevelopments, guided by empirical findings in thespeech production literature. For example, it itclear that the hybrid model must be able toaccommodate data on the consequences forintergestural timing of mechanict.1 perturbationsdelivered to the articulators during speaking.Without feedback connections that directly orindirectly link the articulators to the intergesturallevel, a mechanical perturbation to a limb orspeech articulator could not alter the timingstructure of a given movement sequence. Recentdata from human subjects on unimanualoscillatory movements (Kay, 1986; Kay, Saltzman,& Kelso, 1989) and speech sequences (Gracco &Abbe, in press) demonstrate that transientmechanical perturbations induce systematic shiftsin the timing of subsequent movement elements.In related animal studies (see footnote [3]),transient muscle-nerve stimulation duringswimming movements of a turtle's hindlimb werealso shown to induce phEee shifts in the locomotorrhythm. Taken together, such data provide strongevidence that functional feedback pathways existfrom the articulators to the intergestural level inthe control of sequential activity. These pathwayswill be incorporated into our hybrid dynamicalmodel (see the lighter pathway indicated in Figure2).

Intrinsic vs. cdrinsic timing Autonomousvs. nonautonomous dynamicsAs discussed earlior (tee Gestural Activation

Coordinates section) there are two time spans

associated with every gesture in the currentmodel. The first is the gestural settling time,defined as the time required for an idealized,temporally isolated gesture to reach a certaincriterion percentage of the distance from initial totarget location in tract-variable coordinates. Thistime span is a function of the gesture's intrinsicset of dynamic parameters (e.g., damping,stiffness). The second time-span, the gesturalactivation interval, is defined according to agesture's sequence-specific activation function. Inthe present model, gestural activation is specifiedas an explicit function of time in the gestural scorefor a given speech sequence. In the hybrid modeldiscussed in the previous section, these activationfunctions would emerge as implicit. consequencesof the serial dynamics intrinsic to a givensequence.

These considerations may serve to clarifycertain aspects of a relatively longstanding andtenacious debate on the issue of intrinsic (e.g.,Fowler, 1977, 1980) versus extrinsic (e.g.,Lindblom, 1983; Lindblom et al., 1987) timingcontrol in speech production. In the framework ofthe current model, intragestural temporalpatterns (e.g., settling times, interarticulatorasynchronies in peak velocities) can becharacterized unambiguously, at least for isolatedgestures, as intrinsic timing phenomena. Thesephenomena are emergent properties of thegesture-specific dynamics implicit in thecoordinative structure spanning tract-variable andarticulator coordinates (Figure 2, interarticulatorlevel). In terms of intergestural timing, the issueis not so clear and depends on one's frame ofreference. If one focuses on the interarticulatorylevel, then all activation inputs originate from the"outside", and activation timing must beconsidered extr insic with reference to this level.Activation timing is viewed as being controlledexternally according to whatever type of clock isassumed to exist or be instantiated at the system'sintergestural level. However, if one considers bothlevels within the same frame of reference then, bydefinition, the timing of activation becomesintrinsic to the system as a whole. Whether or notthis expansion of ref6.-ence frame is useful infurthering our understanding of speech timingcontrol depends, in part, on the nature of the clockposited at the intergestural level. This issue :clock structure leads us to a somewhat moretechnical consideration of the relationshipbetween intrinsic and extrinsic timing on the onehand, and autonomous and nonautonomousdynamical systems on the other hand.

J

Page 65: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

56 Saltzman and Munhall

For speech production, one can posit thatintrinsic timing is identified with autonomousdynamics, and extrinsic timing withnonautonomous dynamics. In an autonomousdynamical system, the terms in the correspondingequation of motion are explicit functions only ofthe system's "internal" state variables (i.e.,positions and velocities). In contrast, anon autonomous system's equation of motioncontains terms `hat are explicit functions of"external" ;lock-time, t, such as f (t) = cos (wt)(e.g., Haken, 1983; Thompson & Stewart, 1986).However, the autonomous-nonautonomousdistinction is just as susceptible to one's selectedframe of reference tub is the distinction betweenintrinsic and extrinsic timing. The reason is thatany nonautonomous system of equations can betransformed into an autonomous one by adding anequation(s) describing the dynamics of the(formerly) external clock-time variable. That is,the frame of reference for defining the overallsystem equation can be extended to include thedynamics of both the original nonautonomoussystem as well as the formerly external clock. Inthis new set of equations, a state of unidirectionalcoupling exists between system elements. Theclock variable affects, but is unaffected by, the restof the system variables. However, when suchunidirectional coupling exists and the externalclock meters out time in the standard, linear time-flow of everyday clocks and watches, we feel thatits inclusion as an extra equation of motion addslittle to our understanding of system behavior. Inthese cases, the nonautonomous descriptionprobably should be retained.

In earlier versions of the present model (Kelso etal., 1986a & 1986b; Saltzman, 1986; see alsoAppendices 1 & 2) only temporally isolatedgestures or perfectly synchronous gesture pairswere simulated. In these cases, the equations ofmotion were truly autonomous, because theparameters at the interarticulatory level did notvary over the time course of the simulations. Theparameters in the present model, however, aretime-varying functions of the activation valuesspecified at the intergestural level in the gesturalscore. Hence, the interarticulatory dynamics(Equation [151) are currently nonautonomous.Because the gestural score specifies theseactivation values as explicit functions of standardclock-time, little understanding is to be gained byconceptualizing the system as an autonomous onethat incorporates the unidirection ally coupleddynamics of standard clock -time and theinterarticulatory level. Thus, the present moil&

most sensibly should be considered asnonautonomous. This would not be true, however,for the proposed hybrid model in which: a) clock-time dynamics are nontrivial and intrinsic to theutterance-specific serial dynamics of theintergestural level; and b) the intergestural andinterarticulator dynamics mutually affect oneanother. In this case, we posit that muchunderstanding is to be gained by incorporating thedynamics of both levels into a single set ofbidirectionally coupled, autonomous systemequations.

INTERGESTURAL COHESIONAs indicated earlier in this article (e.g., in

Figure 1), speech production entails theinterleaving through time of gestures definedacross several different articulators and tractvariables. In our current simulations, the timingof activation intervals for tract-variable gesturesis controlled through the gestural score.Accordingly, gestures unfold i :dependently overtime, producing simulated speech patterns muchlike a player piano generates music. This rule-based description of behavior in the vocal tractmakes no assumptions about coordination orfunctional linkages among the gesturesthemselves. However, we believe that suchlinkages exist, and that they reflect the existenceof dynamical coupling within certain gesturalsubsets. Such coupling imbues these gestural"bundle? with a temporal cohesion that enduresover relatively short (e.g., sublexical) time spansduring the course of an utterance.

Support for the notion of intergestural cohesionhas been provided by experiments that havefocused on the structure of correlated variabilityevidenced between tract-variable gestures in thepresence of externally delivered mechanicalperturbations. Correlated variability is one of theoldest concepts in the study of natural variation,and it is displayed in a system if ''when slightvariations in any one part occur..., other partsbecome modified" (Darwin 1896, p. 128). Forexample, in unperturbed speech it is well knownthat a tight temporal relation exists between theoral and laryngeal gestures for voicelessobstruents (e.g., Lefqvist & Yoshioka, 1981a). Forexample, word-initial aspirated /p/ (in English) isproduced with a bilabial closing-opening gestureand an accompar;ing glottal opening-closinggesture whose peak coincides with stop release. Ina perturbation study on voiceless obstruents(Munhall et al., 1986), laryngeal compensationsoccurred when the lower lip was perturbed during

ti 4

Page 66: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

A Dyna "ieel Approach to Gestural Patterning in Speech Production 57

the production of the obstruent. Specifically, if thelower lip was unexpectedly pulled downward justprior to oral closure, the laryngeal abductiongesture for devoicing was delayed. Shaiman andAbbe (1987) have also reported data consistentwith this finding. Such covariation patternsindicate a temporal cohesion among gestures,suggesting to us the existence of higher order,multigesture units in speech production.

How might intergestural cohesion beconceptualized? We hypothesize that suchtemporal stability can be accounted for in terms ofdynamical coupling structure(s) that are definedamong gestural units. Such coupling has beenshown previously to induco stable intergesturalphase relations in a model of two coupled gesturalunits whose serially repetitive (oscillatory)dynamics have been explored both experimentallyand theoretically in the context ( 7 rhythmicbimanual movements (e.g., Haken, Kelm, & Buns,1985; Kay, Kelso, Saltzman, & SchOner, 1987;Scholz, 1986; Schomer, Haken, & Kelso, 1986).This type of model also provides an olegantaccount of certain changes in intergestural phaserelationships that occur with increases inperformance rate in the limbs and, by extension,the speech articulators. In speech, such stabilityand change have been examined for bilabial andlaryngeal sequences consisting of either therepeated syllable /pi/ or /ip/ (Kelso, Munhall,Tuller, & Saltzman, 1985; also discussed in Kelsoet al., 1986a, 1986b). When /pi/ is spokenrepetitively at a self-elected "comfortable" rate,the glottal and bilabial component gestures for /p/maintain a stable intergestural phase relationshipin which peal, glottal opening lags peak oralclosing by an amount that results in typical (forEnglish) syllable-initial aspiration of the /p/. Forrepetitive se.; of /ip/ spoken at a similarly:omfortable rate., peak glottal opening occurredsynchronously with peak oral closing as is typical(for English) of unaspirated (or minimallyaspirated) syllable-final /p/. When /pi/ wasproduced repetitively at a ser-paced increasingrate, intergestural phase remained relativelysi...ble at its comfort value. However, when /ip/was scaled similarly in rate, its phase relation wasmaintained at its comfort value until, at a criticalspeaking rate, an abrupt shift occurred to thecomfort phase value and corresponding acousticpattern for the /pi/.

In the context of the model of bimanualmovement, the stable intergestural phase valuesat the comfort rate and the phase shift observedwith rate scaling are reflections of the dynamical

behavior of nonlinearly coupled, higher-orderoscillatory modes. This use of modal dynamicsparallels the identification of tract-variables withmode coordinates in the present model (seeAppendix 1). Recall that the dynamics of thesemodal tract-variables serve to organize patterns ofcooperativity among :..re articulators in a gesture-specific manner (see the earlier section entitledModel Articulator and Tract VariableCoordinates). Such interarticulator coordination isshaped according to a coupling structure amongthe articulators that is "provided by" the tract-variable modal dynamics. By extension, patternsof intergestural coordination are shaped accordingto inter-tract-variable coupling strAtures"provided by" a set of even higher-ordermultigesture modes. Because tract-variables aredefined as uncoupled in the present model(Equation (All), it seems clear that (some sort of )inter-tract-variable coupling must be introducedto simulate the multigesture functional unitsevident in the production of speech.10

Such multigesture units could play (at least)three roles in speech pr iuction. One possibility isa hierarchical reduction of degrees of freedom inthe control of the speech articulators beyond thatprovided by individual tract-variable dynamicalsystems ( e.g., Bernstein, 1967). A second, relatedpossibility is that multigesture functional unitsare particularly well suited to attainingarticulatory goals that are relatively inaccessibleto individual (or uncoupled) gestural units. Forexample, within single gestures the associatedsynergistic articulators presumably cooperate inachieving local constriction goals in tract-variablepace, and individual articulatory covariation ishaped by these spatial conbtraints. Coordinationietween tract-variable gestures might serve to

achieve more global aerodynamic/acoustic effectsin the vocal tract. Perhaps the most familiar ofsuch between-tract-variable effects is that of voiceonset time (Lisker & Abramson, 1964), in whichsubtle variati, ns in the relative timing oflaryngeal and oral gestures contribute toperceived contrasts in the voicing and aspirationcharacteristics of stop consonants.

The third possible role for multigesture units isthat of phonological primitives. For example, inBrowman and Goldstein's (1986) articulatoryphonology, the phonological primitives aregestural constellations that are defined as"cohesive bundles" of tract-variable gestures.Intergestural cohesion is conceived in terms of thestability of relative phasing or spatiotemporalrelations among gestures within a given

P.0

Page 67: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

58 Saltzman and Munhall

constellation. In some cases, constellationscorrespond rather closely to traditional segmentaldescriptions: for example, a word-initial aspirated/p/ (in English) is represented as a bilabial closing-opening gesture and a glottal opening-closinggesture whose peak coincides with stop release; aword-initial Is/ (in English) is represented u atongue tip raising-lowering and a glottal opening-closing gesture whose pelt coincides with mid-frication. In other cues, however, it is clear thatBrowman and Goldstein offered a perspective thatis both linguistically radical and empiricallyconservative. They rejected the traditional notionof segment and allowed as phonological primitivesonly those gestural constellations that can beol oerved directly from physical patterns ofarticulatory movements. Thus, in some instances,segmental and constellation representationsdiverge. For example, a word-initial /sp/ cluster(unaspirated in English) is represented as aconstellation of wo oral gestures (a tongue-tipand bilabial constriction- release sequence) and asingle glottal gesture whose peak coincides withmid-frication. This representation is based on theexperimental observation that for such clustersonly one single-peaked glottal :vesture occurs (e.g.,Lisker, Abramson, Cooper, & Schvey, 1969), andthus captures the language-specific phonotacticconstraint (for English) that there is no voicingcontrast for stops following an initial is/. Thegestural constellation representation of /sp/ isconsequently viewea as superior to a moretraditional segmental approach which mightpredict two glottal gestures for this sequence.Our perspective on this issue is similar to that ofBrownian and Goldstein in that we focus on thegestural structure of speech. Me these authors,we assume that the underlying phonologicalprimitives are context-independent cohesive'bundle? or constellations of gestures whosecohesiveness is indexed by stable patterns ofintergestural phasing. However, we adopt aposition that, in comparison with theirs, is bothmore conservative linguistically and more radicalempirically. We assume that gestures cohere inbundles corresponding, roughly, to traditionalsegmental descriptions, and that these segmentalunits maintain their integrity in fluent speech. Weview many context-dependent modifications of thegestural components of these units as emergentcodsequences of the serial dynamics of speechproduction. For example, we consider the singleglottal gesture accompanying English word-initial/sp/ clusters to be a within-tract-variable blend ofseparate glottal gestures associated with the

underlying Is/ and /p/ segments (see the followingsection for a detailed discussion of the observablekinematic 'traces' left by such underlyinggestures).

INTERGESTURAL TIMING PATTERNS:EFFECTS OF SPEAKING RATE AND

SEQUENCE COMPOSITION

One of the working assumptions in tide article isthat gestural coproduction is an integral filature ofspeech production, and that many factorsinfluence the degree of gestural overlap found fora given utterance. For example, a strikingphenomenon accompanying increases in speakingrate or degree of casualness is that the gesturesassociated with temporally adjacent segmentstend to "slid? into one another with a resultantincrease in temporal overlap (e.g., Browman &Goldstein, in press; Hardcastle, 1985; Machetanz,1989; Nittrouer, Munhall, Kelso, Taller, & Harris,1988). Such intergestural sliding occurs bothbetween and within tract-variables, and isinfluenced by the composition of the segmentalsequence az well as its rate or casualness ofproduction. We turn now to some examples of theeffects on intergestural sliding and blending ofchanges in speaking rate and sequencecomposition.

Speaking rateHardcastle (1985) showed with electropalato-

graphic data that the tongue gestures associatedwith producing the (British English) consonantsequence /kW tend to slide into one another andincrease their temporal overlap withexperimentally manipulated increases in speakingrate. Many examples of interarticulator slidingwere also identified some years ago by Stetson(1951). Stetson was interested in studying thechanges in articulatory timing that accompanychanges in speaking rate and rhythm. Particularlyinteresting are his scaling trials in whichutterances were spoken at increasing rates.Figure 11 is one of Stetson's figures showing thetime course of lip (L), tongue (T), and air pressure(A) for productions of 'sap' at different speakingrates. As can be seen, the labial gesture for /p andthe tongue gesture for Is/ are present throughoutthe scaling trial but their relative timing varieswith increased speaking rate. By syllable 4 thetongue gesture for /s/ and the labial gesture for /pfrom the preceding syllable completely overlap,and syllable identity is altered from then on in thetrial.

66

Page 68: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

. ' Dynamical Approach to Gestural Patterning in Speech Production

L -

140

ft

59

FIGURE 62. Abutting Consonants; Continuant with Stop

Syllables: sap sap . . .

L--Lip marker. Contact grows shorter andlighter as the rate increases andoverlapping and coincidence occur.

TTongue marker. Well marked doubling fromsyl. 5-6; thereafter the singes releasing

compound form ps.AAir in mouth. Doubling forms, syl. 5-6.

AOAir outside. Varied in appearance because0 the high pressure during the continuants. Plateau of s becomes mere point ascompound form appears, syl. 6-7.

Figure 11. Articulatory and aerodynamic records takes during productions of the syllable "sap" as speaking rateincreases. (from Stetson, 1951; used with publisher's permission).

In terms of the present theoretical framework,these instances of relative slidingcan be describedas occurring between the activation intervalsassociated with tongue dorsum gestures (for thevelar consonant /k/), tongue tip gestures (for thealveolars /s/ and /IA and lip aperture gestures (forthe bilabial /p/). During periods of temporaloverlap, the gestures sharing articulators incommon are blended. Because the gestures aredefir. d in separate tract variables, they areobservably distinct in articulatory movementrecords. Such patterns of change might beinterpretable as the response of the hybriddynamical model discussed earlier (see HybridModel section) to hypothetically simple changes inthe values of a control parameter or parameter setpresumably at the model's intergestural level (seeFigure 2). One goal of future empirical andsimulation research is to test this notion, and ifpossible, to identify this parameter set and themeans by which it is scaled with speaking rate.

Lofqvist & Yoshioka (1981b) have providedevider..0 for similar sliding and blending withintract-variables in an analysis of transilluminationdata on glottal devoicing gestures (abdoction-adduction sequences) for a native speaker ofIcelandic (a Germanic language closely related toSwedish, English, etc.). These investigators

demonstrated intergestural temporal reorga-nization of glottal activity with spontaneousvariatisn of speaking rate. For example, the cross-word-boundary sequence Mk/ was accompaniedby a two-peaked glottal gesture at a slow rate, butby a single-peaked gesture at fast raters. Theinterpretation of these data was that chere weretwo underlying glottal gestures (one for /t/, one for/k/) at both the slow and fast rates. The visibleresult of only a sing, e gesture at the fast rateappeared to be the simple consequence of blendingand merging these two highly overlapping,underlying gestures defined within the same tractvariable. These results have since been replicatedfor two speakers of North American Englishduring experimentally controlled variations in theproduction rates of /s#t/ sequences (Munhall &Lafqvist, 1987).

Sequence composition Laryngealgestu -s, oral-laryngeal dominance

The rata scaling data described in the previoussection for laryngeal gestures provide support forthe hypothesis that the single-peaked gesturesobserved at fast speaking rates resulted from thesliding and blending of two underlying,sequentially adjacent gestures. In turn, thisinterpretation suggests a reasonable account of

Page 69: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

60 Sahman and Munhali

glottal behavior in the production of segmentalsequences containing fricative-stop clusters.

Glottal transillumination and speech acousticdata for word -final /51d, /kaki, and /ps#e/(unpublished data from Fowler, Munhall,Saltzman, & Hawkins, 1986a, 1986b) showed thatthe glottal opening-closing gesture for /s#/, incomparison to the other cases, was smaller inamplitude, shorter in duration, and peaked closerin time to the following voicing onset. Thesefindings are consistent with the notion that aseparate glottal gesture was associated with thecluster-initial stop, and that this gesture left itstrace both spatially and temporally in blendingwith the following fricative gesture to produce alarger,11 longer, and earlier-peaking singlegestural aggregate. Other data from thisexperiment also indicate that the single-peakedglottal 41stures observed in word-final clusterswere the result of the blending of two overlappingunderlying gestures. These data focus on thetiming of peak glottal opening relative to theacoustic intervals (closure for /p/ or /k/, fricationfor An associated with the production of /0/, /ps #/,/ks#/, /spit /, and /5k#/. For /5#/, the glottal peakoccurred at mid-frication. However, tor /pat/ and/k51/ it occurred during the first quarter offrication; for /apt/ and /sk#/, it occurred during thethird quarter of frication. These data indicate thatan underlying glottal gesture was presert for the/p/ or /k/ in these word-final clusters that blendedwith the gesture for the is/ in a way that "pulled"or "perturbed" the peak of the gestural aggregatetowards the /p/ or /k/ side of the cluster. The factthat the resultant g' Atal peak remained insidethe frication interval for the 181 may be ascribed,by hypothesis, to a relatively greater dominanceover the timing of the glottal peak associated withId compared to that associated with the voicelessstops /p/ or /k/.

Dominance refers to the strength ofhypothesized coupling between oralacoustic/articulatory events (e.g., frication andclosure intervals) and glottal events (e.g., peakglottal opening). The dominance tor a vricelessconsonant's oral constriction over its glottaltiming appears to be influenced by (at least) twofactors.12 The first is the manner class of thesegment: frication intervals (at least for hindominate glottal behavior more strongly than stopclosure intervals. This factor was highlightedpreviously for word-initial clusters by Browmanand Goldstein (1986; cf., Kingston's [in press]related use of oral-laryngeal "binding" andGoldstein's [in press] reply to Kingston). As just

discussed, this factor also appears to influenceglottal timing in word-final clusters. The necondfactor is the presence of a preceding word-initialboundary: viol d-initirl conson ants dominateglottal behavior more strongly than the same non-word-initial consonant_ There two factors appearto have approximately additive effects, asillustrated by the following examples of fricative-stop sequences defined word- or syllable-initiallyand across word boundaries. In these cases, aswas the case word - finally, the notion of dominancecan be invoked 44:, suggest that the single-peakedglottal gestures observed for such clusters are alsoblends of two underlying, overlapping gestures.

Example 1. In English, Swedish, and Icelandic(e.g., Lofqvist, 1980; Lofqvist & Yoshioka, 1980,1981a, 1981b,1984; Yoshioka, Lofqvist, & Hirose,1981), word-initial "R-(voiceless)stop/ clusters and/s/ are produced with a single-peaked glottalgesture that peaks at mid-frication. Word-initial/p/ is produced with a klottal gesture peaking at orslightly before closure release. 'Thus, the word-witial position of the Id in theta clustersapparently bolsters the intrinsically high"segmental" dominance of the /s/, and eliminates the displacement of the glottal peak towardthe /p/ that was described earlier for word-finalclusters.

Example 2. The rate scaling study for the cross-word boundary /s#t/ sequence described earlier(Munhall & Lofqvist 1987) showed two single-peaked glottal gestures for the slowest speakingrates, one double-peaked gesture for intermediaterates, and one single-peaked gesture at the fastestrate. At the slow and intermediate rates, the firstpeak occurred at mid-frication for the /s #/ and thesecond peak occurred at closure release for the/W. The single peak at the fastest rate occurred atthe transition between frication offset and closureonset. These patterns indicate that when the twounderlying glottal gestures merged into a single-peaked blend, the peak was located at a"compromise" position between the intrinsicallystronger !s/ and the intrinsically weaker /t/augmented by its word-initial status.

Example 3. In Dutch (Yoshioka, Lofqvist, &Collier, 1982), word-initial /#p/ (voiceless,unaspirated) is produced with a glottal gesturepeaking at midclosure, and the glottal peak for/#5/ and /#sp/ occurs at mid-frication. However, for/tips/ (an allowable sequence in Dutch), the glottalpeak occurs at the transition between closureoffset and frication onset. Again, this suggeststhat when the inherently stronger /s/ isaugmented by word-initial status in /#sp/, the

68

Page 70: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

A Dynamical Approach to Gestural Patterning in Speech Production 61

glottal peak cannot be perturbed away from mid-frication by the following /p/. However, when theintrinsically weaker /p/ is word-initial, the glottalpeak is pulled by the /p/ from mid-frication to theclosure-frication boundary.

Exampl. 4. In Swedish (e.g., Lofqvist &Yoshioka, 1130), some word-final voiceless stopsare aspirated (e.g., /k#/), and are produced withglottal gestures peaking at stop release. Word-initial Mil is produced with glottal peak occurringat mid-frication (see Example 1). When the cross-word-boundary sequence /k#s/ is spoken at a

sturar rate, a single glottal gesture is producedwith its peak occurring approximately at mid-frication. This is consistent with the high degreeof glottal dominance expected for the intrinsicallystropger /s/ in a word-initial position for the /10s/sequence.

These examples provide support for thehypothesis that fricative-stop sequences can beassociated with an underlying set of t votemporally overlapping but slightly offsetcomponent glottal gestures blended into a singlegestural aggregate. These examples focused on theobservable kinematic "traces" evident in thetiming relations between the aggregate glottalpeak and the acoustic intervals of the sequence.Durational data also suggest that such singleobservable gestures result from a two-gestureblending process. For example, the glottal gesturefor the cluster /#st/ is longer in duration thaneither of the gestures for /#s/ and /#t/ (McGarr &Lofqvist, 1988). A similar pattern has also beenfound by Cooper (1989) for word-internal, syllable-initial NW, /#p/, and /#sp/.

SUMMARYWe have outlined an account of speech

production that removes much of the apparentconflict between observations of surface variabilityon the one hand, and the hypothesized existenceof underlying, invariant gestural units on theother hand. In doing so, we hav, describedprogress made toward a dynamical model ofspeech patterning Viet can produce fluentgestural sequences and specify articulatorytrajectories in some detail. Invariant units areposited in the form of relations between context-independent sets of gestural parameters andcorresponding subsets ( f activation, tract-variable,and articulatory coordinates in the dynamicalmodel. Each gesture's influence over the valvingand shaping of the vocal tract waxes and wanesaccording to the activation strengths of the units.Variability lerges hi the unfolding tract-variable

and articulatory movements as a result of both theutterance-specific temporal interleaving ofgestural activations, and the accompanyingpatterns of blending or coproduction. Thetiming of the gestures and the interarticulatorcoopentivity evidenced for a currently activegesture set are governed by two functionallydistinct but interacting levels in the modeltheintergeatural and interarticulatory coordinationlevels, respectively. At present, the dynamics ofthe interarticulatory level are sufficiently welldeveloped to offer promising accounts ofmovement patterns observed during unperturbedand mechanically perturbed speech sequences,and during periods of coproduction. We have onlybegun to explore the dynamics of the i.stergesturallevel. Yet even these preliminary considerations,grounded in developments in the dynamicalsystems .iterature, have already begun to shedlight on several longstanding issues in speechscience, namely, the issues of intrinsic versusextrin-*- timing, the nature of intergesturalcohesion, and the hypothesized existence ofsegmental units in the production of speech. Wefind these results encouraging, and look forwardto further progress within this researchframework.

REFERENCESAbbe, J. H., & Grace°, V. L (1983). Sensorimotor actions in the

control of multimovement speech gestures. Trends inNeuroscience, 6, 393-395.

Abraham, R., & Shaw, C. (1982). Dynamics-The geometry of behavior.Part 1: Periodic behroior. Santa Cruz, CA: Aerial Press.

Abraham, R., & Shaw, C (1986). Dynamics A visual introduction.In F. E. Yates (Ed.), Self-organazing systems: The emergence oforder. New York: Plenum Press.

Arbib, M. A. (1984). From synergies and embryos to motorschemas. In H. T. A. Whiting (Ed.). Human motor actions:Bernstein reassessed (pp. 545-562). New York: North-Holland.

Ballard, D. H. (1986). Cortical connections and parallel processing:Structure and function. The Behavioral and Brain Sciences, 9, 67-120.

Ballieul, J., Hollerbach, J., & Brocket, R. (1984; December).Programming and control of kinematically redundantmanipulators. Proceedings of the 23rd IEEE Conference on Decisionand Control. Las Vegas, NV.

Bell-Berti, F., & Harris, K. S (1981). A temporal model of speechproduction. Phonetics, 38, 9-2C

Benati, M., Caglio, S., Morasso, P., Tagllasco, V., & Zaccaria, R.(1980). Antlynpomorphic robotics. I. Representing mechanicalcomplexity. Biological Cybernetics, 38,125 -140.

Bernstein, N. A. (1967). The coordination and regulation ofmovements. London: Pergamon Press. Reprinted in H. T. A.Whiting (Ed.). (1984). Human motor actions: Bernstein reassessed.New York: North-*-4olland.

Boyce, S. E. (1988). The influence of phonological structure "narticwatory organization in Turkish and English: Vowel .rmonand coarticulation. Unpublished doctoral dissertation,Department of Linguistic, Yale University.

Page 71: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

62 Saltzman an.I MunhaU

Browman, C. P., & Goldstein, L. (1986). Towards an articulatoryphonology. Phonology Yearbook, 3, 21)-252.

Browman, C. P Goldstein, L. (in press). Tiers In articulatoryphonology, with 'LAO implications for casual speech. In J.Kingston Sc M. E. leckman (Eels.), Papers in laboratoryPhonology: I. Between the Grammar and the Physics of Speech.Cambridge, England: Cambridge University Press.

Brownian, C. P., Goldstein, L, Kelso, J. A. S., Rubin, P., &Saltzman, E. L (1984). Articulatory synthesis from underlyingdynamics (Abstract). Journal of the Acoustical Society of America,75 (SuppL 1), S22423.

Browman, C. P., Goldstein, L., Saltzman, E. L., & Smith, C. (1986).GEST: A con-.putational model for speech production usingdynamically defined articulatory gestures [Abstract). Journal ofthe Acoustical Society of America, 80 (Suppl. 1), 597.

Bullock, D., & Gross.arg, S. (1988a). Neural dynamics of plannedarm movements: Emergent invariants and speed-accuracy

operties during trajectory formation. Psychological Review, 95,49-90.

Bullock, D., & Grossberg, S. (1988b). The VITE model: A neuralcommand circuit for generating arm and articulatortrajectories. In J. A. S. Kelso, A. J. Mandell, & M. F. Schlesinger(Eds.), Dynamic pettenu in complex systems. Singapore: WorldScientific Publisher&

Chomsky, N., & Ha.le, M. (1968). flu %and pattern 4 English. NewYork Harper & Pnw.

Cohen, M. A., Grossberg, S., & Stork, D. G. (1988). Speechperception and production by a self-organizing neuralnetwork. In Y. C. tee (Ed.), Evolution, kerning, cognition, andadvanced =Mixtures. Hong Kong: World Sdentik Publishers.

Coker, C. H. (1976). A model of articulatory dynamics andcontrol. Proceeding; of the IEEE, 64, 452-460.

k (1989). (An articulatory description of the distributionof aspiration in English]. Unpublished research.

Darwin, C. (1896). The origin of species. New York: Caldwell.Fahlatan, S. E., & Hinton, C. E. (1987). Connectionist architectures

for artificial intelligence. Computer, 20, 1C0-109.Feldman, J. A., Sc Ballard, D. H. (1982). Connectionist models and

their properties. Cognitive Science, 4 205-254.Folkins, J. W., & Abbs, J. H. (1975). Lip and jaw motor control

during speech: Responses to resistive loading of the jaw.Journal of Speech and Hearing Research,18, 207-220.

Fowler, C. A. (1977). Timing control in speech production.Bloomirgion, Indiana University Linguistics Club.

Fowler, C. A. (1980). Coarticulation and theories of extrinsictiming control. Journal of Phonetics, 8,113 -133.

Fowler, C. A. (1983). Converging sources of evidence on spokenand perceived rhythms of speech: Cyclic production of vowels

. monosyllabic s' as feet. Journal of Experimental Psychology:Human Perception and Per form 386412.

Fowler, C. A., Munis% K. C., Saltzman, E. L, & Hawkins, S.(1986a). Acous'ic and articulatory evidence for consonant-

novel interactions (Abstract]. Journal of the Acoustical Society ofAmerica, 80 (Suppl. 1), 96.

Fowler, L. A., Munhall, ,C. G., Saltzman, E. L., a Hawkins, S.(1986b). [Laryngeal movements in word-final single consonantsand consonant dusters]. Unpublished research.

Coy, T. J., Lindblom, B., & Lubker, J. (1981). Production of bite -biock vowels: Acoustic equivalence by selective compensation.Journal of the Acoustical Society of America, 69, 802-810.

Goldstein L (in press). On articulatory binding: Comments onKingston's paper. In J. Kingston & M. E. Beckman (Eds.), Papersin laboratory phonology: I. Between the grammar and the physics ofspeech. Cambridge, England: Cambridge University Press.

Gracco, V. L., Sc Abbe, J. H. (1986). Variant and invariantcharacteristics of speech movements. Experimental BrainResear ch, 65,156166.

Gracco, V. L., & Abbe, J. H. (1989). Sensorimotor characteristics ofsp.-Tit motor sequences. Experimental Brain Research, 75, 586-590.

Greene, P. H. (1971). Introduction. In I. M. Celfand, V. S.Gurfinkel, S. V. Fomin, & M. L. Tsetlin (Eds.), Models of thestructural-functional organization of certain biological systems(pp. xi-mod). Cambridge, MA: MIT Press.

Grossberg, S. (1978). A theory of human memory: Self-organization and performance of sensory-motor codes, nu. is,and plans. In R. Rosen & F. Snell (Eds.), Progress in theoreticalbiology (Vol. 5, pp. 233-374). New York: Academic Press.

Grossberg, S. (1982). Studies of mind and brain: Newel principles oflearning, perception, development, cognition, and motor control.Amsterdam: Reidel Press.

Grossberg, S. (1986). The adaptive self-orgardzslon of itrial orderin behavior: Speech, language, and motor )ntrol. InE. C. Schwab & H. C. Nusbaum (Eds.), Pattern recognition byhumans and machines (Vol. 1, pp. 187-294). New York: AcPress.

Grossberg, S., & Mingolla, E. (1986). Computr simulation ofneural networks for perceptual psychology. Behavior iesearchMethods, Instruments, & Computers, 18, 601-607.

Guckenheimer, J., & Holmes, P. (1983). Nonli,sear oscillations,dynamical systems, and bifurcations of vector fields. New York:Springer-Verlag.

Haken, H. (1983). Advanced synergetics. Heidelberg: Springer-Verlag.

Haken, H., Kelso, J. A. S., & Bunz, H. (1985). A theoretical modelof phase transitions in human hand movements. BiologicalCybernetics, 51,347-356.

Hardcastle, W. J. (1981). Experimental studies in lingualcoarticulatlon. In R. E. Asher & E. J. A. Henderson (Eds.),Thwards a history of phonetics (pp. 50-66). Edinburgh, Scotland:Edinburgh University Press.

Hardcastle, W. J. (1985). Soma phonetic and syntactic constraintson lingual coarticulatlon during /kl/ sequences. SpeechCommunication, 4, 247-263.

Harris, K. S. (1984). Coarticulation as a component of articulatorydescriptions. In R. G. Daniloff (Ed.), Articulation assessment andtreatment issues (pp. 147-167). San Diego: College Hill Press.

Henke, W. L (1966). Dynamic articulatory model of speech productionusing computer simulation. Unpublished doctoral dissertation,Massachusetts Institute of Technology.

Hopfleld, J. J., & Tank, D. W. (1986). Computing with neuralcircuits: A model. Science, 233, 625-633.

Jespersen, 0. (1914). Lehrbuch der Phoneta [Textbook of Phonetics).Leipzig: Teubner.

Joos, M. (1948). Acoustic phonetics. Lantuage, 24 (SM23), 1-136.Jordan, M. I. (1985). The learning of representations for sequential

performance. Unpublished doctoral dissertation, Department ofCognitive Science and Psychology, University of California,San Diego, CA.

Jordan, M. I. (1986). Serial order in behavior: A parallel distributedprocessing approach (Tech. Rep. No. 8604). San Diego: Universityof California, Institute for Cognitive Science.

Jordan, M.I. (in press). Serial order: A parallel distributedprocessing zpproach. In J.L. Elman & D.E. Rumelhart (Eds.),Advances in connectionist lhxry: Speech. Hillsdale, NJ: LawrenceErlbau n Associates, Inc.

Kay, B. A. (1986). Dynamic modeling of rhythmic limb movements:Converging on a description of the component oselators.Unpublished doctoral dissertation, Department of Psychology,University of Connecticut, Storrs, CT.

Kay, B. A., Kelso, J. A. S., Saltzman, E. L., & Softener, G. (1987).Space-time behavior of single and bimanual rhythmicalam- mews: Data and limit cycle model. Journal of ExperimentalPsychology: Human Per-Potion and Performance,13,178-192.

70

Page 72: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

A Dynamical Approach to Gestural Patterning in Speech Production 63

Kay, B. A., Saltzman, E. L, & Kelso, J. A. S. (1989). Steady-state andperturbed rhythmical movements: A dynamical analysis. Manuscriptsubmitted for publication.

Keating, P. A. (1985). CV phonology, experimental phonetics, andcoerticulatlon. UCLA Working Papers in Phonetics, 62,1-13.

Kelso, J. A. S., Munhall, K. G., Tuller. B., & Saltzman, E. L. (1985).[Phase transitions in speech production). Unpublishedresearch.

Kelso, J. A. S., Saltzman, E. L, & Tuller, B. (1986a). The dynamicaltheory on speech productior Data and theory. Journal ofPhonetics, 14, 29-60.

Kelso, J. A. S., Saltzman, E. L, & Tuller, B. (1986b). Intentionalcontents, communicative context, and task dynamics: A replyto the conimentators. Journal of Phonetics,14, 171-196.

Kelso, J.AS. & Tulle, B. (in press). Phase transitions in speechproduction and their perceptual consequences. In M. Jeannerod(Ed.), Attention and performance, XIX Hillsdale, NJ: LawrenceErlbsum Assodates, Inc.

Kelso, J. A. S., Tulle, B., Vatikiotis-Bateson, E., & Fowler, C A.(1984). Functionally specific articulatory cooperation followingjaw pertur Adana during speech: Evidence for coordinativestructures. Journal of Experimental Psychology: Hunsa : Perceptionend Performance,10, 812-832.

Kelso, J. A. S., VatildolisBateson, E, Saltzman, E. L, & Vay, B. A.(1985). A qualitative dynamic anal, 4 of reiterant speechproduction: Phase portraits, kinematics, and dynamicmodeling. Journal of the Acoustical Society of America, 77, 285 280.

Kent, R. D., & Mlnifie, F. D. (1977). Coartindation in recent speechproduction models. Journal of Phonetics, 5, 115-133.

Kingston, J. (in press). Articulatory binding. In J. Kingston & M. E.Beckman (Eds.), Papers in laboratory phonology: I. Between thegrammar and physics of speech. Cambridge, England: CambridgeUniversity Press.

Klein, C. A., & Huang, C. H. (1983). Review of pseudoinversecontrol for use with kinematically redundant ma,ipulators.IEEE Transactions on Systems, Man, and Cybernetics, SMC-13,z45-250.

Klelnfeld, D. & Simpolinsky, H. (1988). Associative neuralnetwork model for the generation of temporal attems: Theoryand application to central pattern generators. BiophysicalJournal, 54,1039-1051.

Krakow, R. A. (1987; November). Stress effects on the articulationand coarticulation of labial and velic gestures. Paper ,misented atthe meeting of the American Speech-Language-HearingAssociation, New Orleans, LA.

Kugler, P. N., Kelso, J. A. S., & Turvey, M. T. (1980). On theconcept of coordinative structures as dissipative structures: I.Theoretical lines of convergence. In G. E. Stelmadi & J. Regain(Edo.), Tutorials in motor behavior (pp. 3-47). New York: North-Holland.

Kugler, P. N., Kelso, J. A. S., & Turvey, M. T. (1982). On thccontrol and coordination of naturally developing systems. In J.A. S. Kelso & J. E. Clark (Eds.), The development of movementcontrol and coos fixation (pp. 5-78). Chichester, England: JohnWiley.

Kugler P. N., & Turvey, M. T. (1987). Information, natural law, andthe self-assembly of rhythmic movement. Hillsdale, NJ: LawrenceErlbaum Associates, Inc

Lapedes, A., & Farber, R. (1986). Programming a massively parallel,compute ion universal system: Static behavior (Preprint No. LA-UR 86-7179). Los Alamos, NM: Los Alamos NationalLaboratory, .

Lennard, P. R (1985). Afferent perturbations during"monopodal" s..,mming movements in the turtle: Phase-dependent cutaneous modulation and proprioceptive resettingof the locomotor rhythm. The Journal of Neuroscience, 5, 1434-1445.

Lennard, P. R., & Hermanson, ;. W. (1985). Central reflexmodulation during locomotion. Trends in Neuroscience, 8, 483-486.

Lindblom, B. (1983). Economy of speech gestures. In P. F.MacNeilage (Ed.), The production of speech (pri. '7.7-245). NewYork: Springer-Verlag.

Lindblom, B., Lubker, J., Gay, T., Lyberg, B., Branderud, P., &Holn"ren, K. (1..'7). The concept of target and speech timing.In R. Lhannon & L. Shodcey (Eds.), In honor of Ilse Lthiste (pp.161-181%. Dordrecht, Netherlands: Foris Publications.

Lisker, L., & Abramson, A. S. (1964). A cross-language study ofvoicing in initial stops: Acoustical measurements. Word, 20,384-422.

Lisker, L, Abramson, A. S., Cooper, F. S., & Schvey, M. H. (1969).Transillumination of the larynx in running speech. Journal of theAcoustical Society of Americo, 45,1544 -1546.

Ltifqvist, A. (1980). Interar "culator programming in stopproduction. Journal or Phonetics, 0, 475-490.

Ltifqvbt, A., & Yoshioka, FL (1980). Laryngeal activity in Swedishobstruent dusters. Journal of the Acoustical Society of America, 68,792-801.

Lofqvist, A., & Yoshioka, H. (1981a). Interarticulatorprogramming it obstruent production. Phonetica, 38, 21-34.

Ltifqvist, A., & Yoshioka, H. (1981b). Laryngeal activity inIcelandic obstruent production. Nordic Journal of Linguistics, 4,1-18.

Ltifqvist, A., & Yoshioka, H. (1984). Intrasegmental timing:Laryngeal-c,isl coordination in voiceless consonant production.Speech Communication, 3, 279-289.

Macchi, M. (1985). Segmental and suprasegmental features and lip andjaw articulators. Unpublished doctoral dissertation, Departmentof Linguistics, New York University, New York, NY.

Machetanz, J. (1989, May). Tongue movements in speech at differentrates. Paper presented et the Salk Laboratory for Language andCognitive Studies, San Diego, CA.

MacNeilage, P. F. (1970). Motor control of serial ordering ofspeech. Psychological Review, 77,182-196.

Mattingly, I. G. (1981). Phonetic representation and speechsynthesis by rule. In T. Myers, J. Laver, & J. Anderson (Eds.),The cogs loe representation of speech (pp. 415-420). Amsterdam:North-Holland.

McGarr, N. S., & Ldfqvist, A. (1988). Laryngeal kinematics invoiceless obstruents produced by hearing-impaired speakers.Journal of Speech and Hearing Research, 31, 734-239.

Miyata, Y. (1987). Organization of action sequences in motorlearning: A connectionisr approach. In Proceedings of theNinth Annual Conference of the Cognitive Science Society (pp.496-507). Hillsdale, NJ: Lawrence F,r1bainn Associates, Inc.

Miyata, Y. (1988). The learning and planning of actions (Tech. Rep.No. 8802). San Diego, CA: University of California, Institute forCognitive Science.

Munhall, K., & Lofqvist, A. (1987). Gestural aggregation in speech.PAW Review, 2,13.15.

Munhall, K. G., & Kelso, J. A. S. (1985). Phase-dependentsensitivity to perturbation reveals the nature of speechcoordinative structures (Abstract). Journal of the AcousticalSociety of America, 78 (SuppL 1), S38.

Munhall, K. G., Ldfqvist, A., & Kelso, J. A. S. (1986). Laryngealcompensation following sudden oral perturbation [Abstract).Journal of the Acoustical Society of America, 80 (Suppl. 1), 5109.

Nittrouer, S., Munhall, K., Kelso, J. A. S., Tuller, B., & Harris, K. S.(1988). Patterns of Interarticulator phasing and their relation toUnguistic structure. Journal of the Acoustical Society of America,84,1653-1661.

Ohman, S. E. G. (1966). Coarticulation in VCV utterances:Spectrographic measurements. Journal of the Acoustical Society ofAmerica, 39, 151-168.

!1

Page 73: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

64 Saltzman and Murtha

amen, S. E G. (1967). Numerical model of coarticulation. Journalof the Acoustical Society of America, 41, 310-320.

Perlmutter, B. A. (1988). Learn.% state space trajectories inrecurrent neural networks. In D. S. Touretzky, G. E. Hinton, &T. J. Sejnowskl (Eds.), Proceedins of the 1988 CornectionistModels Summer School. San Mateo, CA: Morgan ICav imann.

Perkell, J. S. (1969). Physiology of speech production: Results andimplications of a quantitative cineradiographic study. Cambridge,MA MIT Press.

Rubin, P. E., Baer, T., & Mermelstein, P. (1981). An articulatorysynthesizer for perceptual research. Journal of AcousticalSociety of Amnia, 70, 321-328.

Ruelle, a (1980). Strange sttractors. The Mathertical Intelligencer,2,126.137.

Rumelhart, D. E.. H" aton, G. E., & McClelland, J. L. (1986). Ageneral framewuik for parallel distributed processing. In D. E.Rumelhart & J. L. McClelland (Eds.), Parallel distributedprocessing: Explorations in the microstructure of cognition, Vol. 1:Foundations (pp. 45-76). Cambridge, MA: MIT Press.

Rumelhart, D. E., Hinton, G. E, & Williams, R. J. (1986). Learninginternal representations by error propagation. In D. E.Rumelhart & J. L. McClelland (Eds.), Parallel distributedprocessing: Explorations in the miaostruct sire of cognition, Vol. 1:Fcandativris (pp. 318-3621. Cambridge, MA: MIT Press.

Ruznelhert, D. E., & Norman, D. A. (1982). Simulating a skilledtypist: A study of skilled cognitive-motor performance.Cognitive Science, 6,1.36.

Saltzman, E. I... (1979). Levels of smtsorimotor representation.Journal of Mathematical Psychology 20, 91-163.

Saltzman, E. L. (1986). Task dynamic coordination of the speecharticulators., A preliminary n Adel. Experimental Brain Research,Ser. 15, 129-144.

Saltzman, E. L., Coldein, L., Browman, C. P., & Rubin, P.(1988a). Dynamics of gestural blending during speechproduction [Abstract]. Neural Networks, 1, 316.

Saltzman, E. L., Goldstein, L., Brownian, C. P., & Rubin, P.(1988b). Modeling speech production using dynamic gesturalstructures [Abstract]. Journal of the Acoustical Society of America,84 (Suppl.1), 5146.

Saltzman, E. L, & Kelso, J. A. S. (1983). Toward a dynamicalaccount of motor memory and control. In R. Magill (Ed.),Memory and control of action (pa. 17-38). Amsterdam: NorthHolland.

Saltzman, E. L., & Kelso, J. A. S. (1987). Skilled actions: A taskdynamic approach. Psychological Review, 94, 84-106.

Saltzman, E. L, Rubin, P., Goldstein, L, & Browman, C. F. (1987).Task-dynamic modeling of interarticulator coordination[Abstract]. Journal of the Acoustical Society of America, 82 (Suppl.1), S15.

ScriOner, C., Haken, H., & Kelso, J. A. S. (1986). Stochastic theoryof phase transitions in Truman hand movement. BiologicalCybernetics, 53,1 -11.

Scholz, J. P. (1986). A nonequilsbrium phase transition in humanbimanual movement: Test of a dynamical model. Unpublisheddoctoral dissertation, Department of Psychology, University ofConnecticut, Storrs, CT.

Shainian, S., & Abbs, J. H. (1987; November). Phonetic task-specificutilization of sensorimotor act inis. Paper presented at the meetingof the American Speech-Language-Hearing Association, NewOrleans, LA.

t:th, C. L., Browman, C. P., & McGowan, R. S. (1988). Applyingthe Program NEWPAR to extract dynamic parameters frommovement trajectories [Abstract]. Journal of the Acoustical Societyof America, 84 (Suppl. 11,5128.

Stanon, R. H. (1951). Motor phonetics: A study of speech movementsin action. Amsterdam: North Holland. Reprinted in J. A. S.

Kelso & K. C. Munhall (Eds.). (1988). R. H. Stetson's motorphonetics: A retrospective edition. Boston: College-Hill.

Stornetta, W. S., Hogg, T., & Huberman, B. A. (1988). A dynamicalapproach to temporal information processing. In D. Z.Anderson (Ed.), Neural information processing systems. NewYork: American Institute of Physic.

Sugar, t, H. M., MacNeilage, P. F., & Hanson R. J. (1973). Labialar mandibular dynamics during the production of bilabialconsonants: Preliminary observations. Journal of Speech andHearing Resterch,16, 397-420.

Tank, D. W. & Hopfield, J. J. (1987). Neural computation byconcentrating information in time. Proceedings of the NationalAcademy of Screws, USA, 84,1896 -1900.

Thompson, J. M. T., & Stewart, H. B. (1986). Nonlinear dynamicsand chaos: Geometrical methods for engineers and scientists. NewYork: Wiley.

Tiede, M. K. & Browman, C. P. (1988). [Articulatory x-ray andspeech acoustic data for CViCV2C sequences]. Unpublishedresearch.

Turvey, M. T. (1977). Preliminaries to a theory of action withreference to vision. In R. Shaw & J. Branford (Eels.), Perceiving,acting, and bloating: Toward an ecological psychology (pp. 211-265).Hillsdale, NJ: Lawrence Erlbaum Associates.

Turvey, M. T., Rosenblum, L D., Schmidt, R. C., & Kugler, P. N.(1986). Fluctuations and phase symmetry in coordinatedrhythmic movements. Journal of Experimental Psychology: HumanPerception and Performance, 12, 564-583.

Vatikiotis- Bateson, E. (1988). Linguistic structure and articulatorydynamics: A cross - language study. Bloomington, IN: IndianaUniversity Linguisdo Cub.

von Hoist, E. (1973). The behaviural physiology of animal and man:The collected papers of Erich von Hoist (Vol. 1; R. Martin, tzars.),London: Methuen and Co., Ltd.

Whitney, D. E. (1972). The mathematics of coordinated control ofprosthetic arms and manipulators. ASME Journal of DynamicSystems, Measurement cut Control, 94,303-309.

Winfree, A. T. (1980). T1,..t geometry of biological time. New York:Springer-Verlag.

Wing, A. M. (1980). The long and short of timing in responsesequences. In G. E. Stelmach & J. lequin (Eds.), Tutorials inmotor behavior (pp. 469-486). New York: North-Holland.

Wing, A. M., & Kristofferson, A. B. (1973). Response delays andthe timing of discrete motor responses. Perception &Psychophysics, 14, 5-12.

Yoshioka, H., Lafqvist, A., & Collier, R. (1982). Laryngealadjustments in Dutch voiceless obstruent production. AnnualBulletin of the Research Institute of Logopedics and Phenistrics, 16,27-35,

Yoshioka, H., LOfqvist, A., & Hirose, H. (1981). Laryngealadjustments in the production of consonant clusters andgeminates in American English. Journal of the Acoustical Societyof America, 70, 1615-1623.

FOOTNOTES'Ecological Psychology,1989, 1(4), 333-382.tDepartment of Communicative Disorders, Skint College,University of Western Ontario, London, Ontario.

1The term gesture is used, here and elsewhere in this article, todenote a member of a family of fu4ctionally equivalentarticulatory movement patterns that are actively controlled withreference to a given speech-relevant goal (e.g., a bilabialclosure). Thus, in nu. usage gesture and movement havedifferent meanin4s. Although gestures are composed ofarticulatory movements, not all movei.-.ents can be ink. pretedas gestures or gestural components.

72

Page 74: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

A Dynamical Approach to Gestural Patterning in Speech Production 65

2For example, we and others have asserted that severalcoordinate systems (e.g., articulatory and higher-order, goal-oriented coordinates), and mappings among these coordinatesystems, must be involved implicitly in the production ofspeech. We have adopted one method of representing thesemappings explicitly in the present model (1.e., using Jacobiansand Jacobian pseudoinverser; see Appendix 2). We make nostrong claim, however, as to the neural or behavioral reality ofthese specific methods.

3An analogous functional partitioning has also been suggestedin recent physiological studies by Lennard (1985) and Lennardand Hermanson (1985) on cydic swimming motions of singlehindlimbs in the turtle. In this work, the authors argued for amodel of the locomotor neural drcuit for turtle swimming thatconsists of two functionally distinct but interactingcomponents. One component, analogous to the presentinterarticulator level, is a central intracyde pattern generator(CIPG) that organizes the patterning of muscular activitywithin each locomotor cycle. The second component,analogous to the present intergestural level, is an oscillatorycentral timing network (CTN) that is responsible forrhythmically activating or entraining the CIPG to produce anextended sequence of cycles (see also von Hoist, 1973). Arelated distinction between 'motor and "clock" coordinativeprocesses, respectively, has been proposed in the context ofhuman manual rhythmic tasks consisting of either continuousoscillations at the wrist joints (e.g., Turvey, Rosenblum,Schmidt, & Kugler, 1986) or discrete finger tapping sequences(e.g., Wing, 1980; Wing & ICristofferson, 1973).

4We do not mean to imply that the production of vocal tractconstrictions and the shaping of articulatory trajectories are theprintery goals of speech production. The functional role ofspeech gestorn is to control air pressures and flows in thevocal tract so as to produce distinctive patterns of sound. Inthis article, we emphasize gestural form and stability asphonetic organizing principles for the sake of relativesimplicity. Ultimately, the gestural approach must come togripe with the aerodynamic sound-production requirements ofspeech.

%ince the preparation of this article, the task-dynamic modelwas extended to incorporate control of the tongue-tip (l L,Tra», glottal (GLO), and velic (VEL) constrictions. Thesetract-variables and associated articulator sets are also shown inFigures 3 and 4. Results of simulations using _.ese newgestures have been reported elsewhere in preliminary form(Saltzman, Goldstein, Browman, & Rubin, 1988a, 1988b).

6Gestural activation pulses are similar functionally to Joos's6948) theorized "innervation waves", whose ongoing values=fleeted the strength of vocal tract control associated withvarious phonological segments or segmental components. Theyare also analogous to die "phonetic influence functions" usedby Mattingly (1981) in the domain of acoustic speech synthesis-by-rule. Finally, the activation pulses share with Fowler's(194.3) notion of segmental "prominence" the property of beingrelated to the "extent to which vocal tract activity is given everto the production of a particular segment" (p. 392).

koarticulatory effects could also originate in two simpler ways.In the first case, "passive" coproduction could result fromcarryover effects associated with the cessation of active gesturalcontrol, due to the inertial sluggishness or time constantsinherent in the articulatory subsystems (e.g., Coker, 1976;

Henke, 1966). However, neither active nor passivecoproduction need be involved in coarticulatory phenomena, atleast in a theoretical sense. Even if a string of segments wereproduced as a temporally discrete (i.e., non-coproduced)sequence of target articulatory steady-states, coarticulatoryeffects on articulatory movement patterns would still result. Inthis second case, context-dependent differences in articulatorytransitions to a given target would simply reflectcorresponding differences in the interpolation of trajectoriesfrom the phonologically allowable set of immediatelypreceding targets. Both "sluggishness" and interpolationcoarticulatory effects appear to be present in the production of

spit.8In Jordan's (1986; in press) model, a given network can learn a

singk set of weights that will allow it to produce severaldifferent sequences. Each ..Ach sequence is produced (andlearnet ) in the presence of a corresponding constant activationpattern in a set of plan units (see Figure 10). These units providea second set of inputs to the network's hidden layer, inaddition to the inputs provided by the state units. We propose,however, to us Jordan's model for cases in which different setsof weights are learned for differtA sequences. In such cases,the plan units are no longer required, and we ignore them inthis article for purposes of simplicity.

9To teach the network to perform a given sequence, Jordan(1986; in press) first initialized the network to zero, and thenpresented a sequence of teaching vectors (each correspondingto an element in the intended sequence), delivering one everyfourth time step. At these times, errors were generated, back-propagated through the network, and the set of networkweights were incrementally adjusted. During the three timesteps between each teaching vector, the network was allowedto run free with no imposed teaching constraints. At the end ofthe teaching vector sequence the network was reinitialized tozero, and the entire weight-correction procedure was repeateduntil the sum of the squared output errors fell below a certaincriterion. After training the network's performance was testedstarting with the state units set to zero.

100ne possibility is to construct explicitly a set of serial mini-networks that could product sequentially cohesive,multigesture units. Then a higher order net could be trained toproduce utterance- speafic sequences of such units (e.g., Jordan,1985). It is also possible that multigesture units could arisespontaneously as emergent consequences of the learning-phasedynamics of connectionist, serial-dynamic networks that aretrained to prxiuce orchestrated patterns of the simpler gesturalcomponents (e.g., Grossberg, 1986; Miyata, 1987, 19BU This isdearly an important area to be explored in the development ofour hybrid dynamical model of speech production (see thesection entitled Toward a Hybrid Dynamical Model).

11Transillumination signals are uncalibrated in terms of spatialmeasurement scale. Consequently, amplitude cliff. .mces inglottal gestures are only suggested, not demonstrated, bycorresponding differences in :ran illumination signal size.Temporal differences (e.g., durations, glottal peak timing) andthe spatiotemporal shape (e.g., one vs. two peaks) oftransillumina lion signals are reliable indices/reflections ofgestural kinematics.

12It is likely that rate and stress manipulations also havesystematic, eff As on oral-glottal coordination. We make nodaims regarding these potential effects in this article, however.

r's

Page 75: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

66 Saltzman and Munhall

APPENDIX 1

Tract-variable dynamical systemThe tract-variable equations of motion are

defined in matrix form as follows:= M-3(- B Kriz), (Al)

where z = the m x 1 vector of current tract-variable positions, with components 74 listed inFigure 4; ki = the first and second derivatives ofz with respect to time; M= amxm diagonalmatrix of inertial coefficients. Each diagonalelement, mgi, is associated with the ith tractvariable; B=amxm diagonal matrix of tract-variable damping coefficients; K= a m x mdiagonal matrix of tract-variable stiffnesscoefficients; and Az = z - z0 where zo = the targetor rest position vector for the tract variables.

By defining the M, B, and K matrices asdiagonal, the equations in (Al) are uncoupled. Inthis sense, the tract variables are assumed torepresent independent modes of articulatorybehavior that do not interact dynamically (seeCoker, 1976, for a related use of articulatorymodes). In current simulations, M is assumed tobe constant and equal to the identity matrix (mij= 1.0 for i =j, otherwise mu = 0.0), wheress thecomponents of B, K, and zo vary during asimulated utterance according to the ongoing setof gestures being produced. For example,different vowel gestures are distinguished inpart by corresponding differences in targetpositions for the associated set of tongue-dorsumpoint attractors. Similarly, vowel and consonantgestures are distinguished in part bycorresponding differences in stiffnesscoeffi:ients, with vowel gestures being slower(less stiff) than consonant gestures. Thus,Equation (Al) describes a linear system of tract-variable equations with time-varyingcoefficients, whose values are functions of thecurrently active gesture set (see the ParameterTuning subsection of the text section ActiveGestural Control: Tuning and Gating for adetailed account of this coefficient specificationprocess). Note that simulations reportedpreviously in Saltzman (1986) and Kelso et al.(1986a, 1986b) were restricted to either single"isolated" gestures, or synchronous pairs ofgestures defined across different tract variables,e. m., single bilabial closures, or synchronous"vocalic" tongue -do: sum and "consonantal"bilabial gestures. In these instances, thecoefficient matrices and vector parameters inEquation (Al) remained constant (time-invariant) throughout each such gesture set.

APPENDIX 2

Model articulator dynamical system;Orthogonal projection operator

A dynamical system for controlling the modelarticulators is specified by expressing tractvariables (z, i) as functions of thecorresponding model articulator variables to, 4,i). The tract variables of Equation (Al) aretransformed into model articulator variablesusing the following direct kinematicrelationships:

z = z(o) (A2a)

= J(r)

I = J(r) j +ate,

(A2b)

(A2c)

where el = the n x 1 vector of current articulatorpositions, with components si listed in Figure 4;z(f) = the current m x / tract-variabla positionvector expressed as a function of the currentmodel articulator configuration. Thesefunctions are specific to the particular geometryassumed for the set of model articulators used tosimulate speech gestures or produce speechacoustics via articulatory synthesis. J(r) = them x n Jacobian transformation matrix whoseelements Ji; are partial derivatives, azi/a0i,evaluated at the current 11. Thus, each row-i of theJacobian represents the set of changes in the ithtract variable resulting from unit changes in allthe articulators; and J(II, 0) = (dJ(Cldt), am x nmatrix resulting from differentiating theelement!, of J(f) with respect to time. The

Jelements of are functions of both the current IPand The elements of J and J thus reflect thegeometrical relationships among motions of themodel articulators and motions of thecorresponding tract variables. Using the directkinematic relationships in Equation (A2), theequation of motion derived for the activelycontrolled model articulators is as follows:

= J*(M-11 - BJ 0 - Kaz(0)l) - J* Jo, (A3)

where NA = an articulatory acceleration vectorrepresenting the active driving influences on themodel articulators; M, B, K, J, and J are thesame matrices used in Equations (Al) and (A2);Az(f) = z(e) - z0, where z. = the same constantvector used in Equation (Ai); It should be notedthat because Az in Equations (Al) and (A3) is notassumed to be "small," a differentialapproximation dz = J(r)do is not justified and,

74

Page 76: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

A Dynamical Approach to Gestural Patterning in Speech Production 67

therefore, Equation (A2a) was used instead forthe kinematic displacement transformation intomodel articulator variables; J* = a n x mweighted Jacobian pseudoinverse (e.g., Benati,Gaglio, Morasso, Tagliasco, & Zaccaria, 1980;Klein & Huang, 1983; Whitney, 1972).J* = W-1JT(JW-1JT)-1 where W is a n x npositive definite at ;iculatory weighting matrixwhose elements are constant during a givenisolated gesture, and superscript T denotes thevector or matrix transpose operation. Thepseudoinverse is used because there are a greaternumber of model articulator variables than tractvariables for this tesk More specifically, usingJ* provides a unique, optimal least squaressolution for the redundant (e.g., Saltzman, 1979)differential transformation from tract variablesto model articulator variables that is weightedaccording to the pattern of elements in the W-matrix. In current modeling, the W-matrix isdefined to be of diagonal form, in which elementwJJ is associated with articulator i A given set ofarticulator weights implements a correspondingpattern of constraints on the relative motions ofthe articulators during a given gesture. Themotion of a given articulator is constrained indirect proportion to the magnitude of thecorresponding weighting element relative to theremaining weighting elements. Intuitively,then, the elements of W establish a gesture-specific pattern of relative Ilreceptivit:es" amongthe articulators to the drivir.g influencesgenerated in the tract-variable state space. In thepresent model, J* has been generalized to a formwhose elements are gated functions of thecurrently active gesture set (see theTransformation gating subsection of the textsection Active gestural control: Tuning andgating for details).

In Equation (A3), the first and second termsinside the inner parentheses on the right handside represent the articulatory accelerationcomponents due to system damping (id) andstiffness (0,), respectively. The rightmost termon the right hand side represents an accelerationcomponent vector (ilvp) that is nonlinearlyproportional to the squares and pairwise productsof current articulatory velocities (e.g., (42)2,124etc.; for further details, see Kl'ibs et al., 1986a,19861 altzman, 1986; Saltzman & Kelso, 1987).

In early simulations of unperturbed discretespeech gestures (e.g., bilabial closure) it wasfound that, after a given gestural target (e.g.,degree of lip compression) was attained andmaintained at a steady value, the articulators

continued to move with very small but non-negligible (and undesirable) velocities. Inessen t, the model added to the articulatormoven ants just those patterns that resulted in notract-variable (e.g., lip aperture) motion aboveand beyond that demanded by the task. Thesource of this residual motion was ascertained toreside in the nonconservative nature of thepseudoinverse (J*; see Equation [A3]) of theJacobian transformation .J) used to relate tract-variable motions and model articulator motions(Klein & Huang, 1983). By nonconservative, wemean that a closed path in tract-variable spacedoes not imply generally a closed path in modelarticulator space.

These undesired extraneous model-articulatormotions were eliminated by includingsupplementary dissipative forces proportional tothe articulatory velocities. Specifically, thecrthogonal projection operator, (Is- J*4), whereIn is an x n ident:ty matrix (Ballieul,Hollerbach & Brocket, 1984; Klein & Huang,1983) was used in the following augmented formof Equation (A3):

iA = J *( M-1[ - Begt - Kaz(0)]) -J*.ii + (In- J*J)) ad, (A4)

where id = BN0 represents an accelerationdamping vector, and BN is anxn diagonalmatrix whose components, big, serve as constantdamping coefficients for the jth component ofj. The subscript N denotes the fact that BN is thesame damping matrix as that used in thearticulatory neutral attractor (see the text sectionon Nonactive Gestural Control, Equations [4] and[5]).

Using Equation (A4), the model generatesmovements to tract-variable targets with noresidual motions in either tract-variable ormodel-articulator coordinates. Significantly,the model works equally well for both the case ofunperturbed gestures, and the case in whichgestures are perturbed by simulated externalmechanical forces (see the text rection Gesturalprimitives). In the present model, the identitymatrix (Is) in Equation (A4) has beengeneralized, like J*, to a form whose elementsare gated functions of the currently activegesture set (see the Transformation gatingsubsection of the text section Active gesturalcontrol: Tuning and gating).

The dan.ping coeffiLients of BN are typicallyassigned equal values for all articulators. Thisresalts in synchronous movements (varying in

"5

Page 77: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

68 Saltzman and Munhail

amplitudes) for the tract variables andarticulators involved in isolated gestures.Interesting patterns emerge, however, if thecoefficients are assumed to be unequal for thevarious articulators (Saltzman et al., 1987). Forexample, the relatively sluggish rotations of thejaw or horizontal motions of the lips may becharacterized by larger time constants than thelips' relatively brisk vertical motions.Implementing these asymmetries into BN,interarticulator asynchronies within singlespeech gestures are generated by the model eatmirror, partially, some patterns reported in theliterature. For example, Gracco and Abbs (1986)showed that, during bilabial closing gestures forthe first /p/ in /szpzpl/, the raising onsets andpeak velocities of the component articulatorymovements occur in the order: upper lip, lowerlip, and jaw. The peak velocities conform to thisorder more closely than the raising onsets. Incurrent simulations of isolated bilabial gestures,the asynchronous pattern of the peak velocities(but not the movement onsets) emerges naturallywhen the elements of BN are unequal.Interestingly, the tract-variable trajectories areidentical to those generated when BO elementsare equal. Additional simulations have revealedthat patterns of closing onsets may becomekulynchronous, however, depending on severalfactors, e.g., the direction and magnitude of thejaw's velocity prior to the onset of the closinggesture.

APPENDIX 3

Competitive network equations forparameter tuning

The postblending actiyation strengths ( DTikand pwiii) defined in text Equation (2) are givenby the steady-state solutions to a set offeedforward, competitive-interaction-networkdynamical equations (e.g., Grossberg, 1986) for

the preblending activation strengths Wild in thepresent model. These equations are expressed asfollows:

Prik ' aik (Prik BP)

[Ba aid + Pa / [ «ow] PTAfeZifok

PWiAim- aik(Pni -BP)

iwk[feZifok

(Na-aikl+PikE 1

, and

(A5a)

1

°Ai Pwiiti ,

(Mb)

where B. and B. denote the maximum valuesallowed for the pre-blending and post-blendingactivation strengths, respectively. In currentmodeling, Bp ar 3a are defined to equal 1.0; andall and Bile are the lateral Inhibition and"gatekeeper" coefficients, respectively, definedin text Equation (2).

The solutions to Equations (A5a) and (A5b) areobtained by setting their left -hand sides to zero,and solving for pTik and pwikj, respectively.These solutions are expressed in Equations (2a)and (2b). The dynamics of Equation (A5) areassume . to be "fast" relat:ve to the dynamics ofthe interarticulator coordination level(Equations [A3] and (AC). Consequently,incorporating the solutions of Equation (A5)directly into Equation (1) is viewed as a justifiedcomputational convenience in the present model(see also Grossberg & Mingolla, 1986, for asimilar computational simplification).

P-1

Page 78: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Haskins Laboratories Status Report on Speech Research1989, SR-991100, 69-101

Articulatory Gestures as Phonological Units*

Catherine P. Browman and Louis Goldsteint

1 A GESTURAL PHONOLOGYOver the past few years, we have been

investigating a particular hypothesis about thenature of the basic 'atoms' out of whichphonological structures are formed. The atoms areassumed to be primitive actions of the vocal tractarticulators that we call 'gestures.' Informally, agesture is identified with the formation (andrelease) of a characteristic constriction within oneof the relatively independent articulatorysubsystems of the vocal tract (i.e., oral, laryngeal,velic). Within the oral subsystem, constrictionscan be formed by the action of one of threerelatively independent sets of articulators: thelips, the tongue tip/blade and the tongue body. Asactions, gestures have some intrinsic timeassociated with themthey are characterizationsof movements through space and over time (seeFowler et al., 1980).

Within the view we are developing, phonologicalstructures are stable 'constellations' (or'molecules', to avoid mixing metaphors) assembledout of these gestural atoms. In this paper, weexamine some of the evidence for, and some of theconsequences of, the assumption that gestures arethe basic atoms of phonological structures. First,we attempt to establish 'IA gestures are pre-linguistic discrete units of action that are inherentin the maturation of a developing child and thattherefore can be harnessed as elements of aphonological system in the course of development(§ 1.1). We then give a more detailed, formalcharacterization of gestures as phonological unitswithin the context of a computational model(§ 1.2), and show that a number of phonological

This paper has benefited from criticisms by Cathi Best, AliceFaber, Elliot Saltzman, Michael Studdert-Kennody, EricVatikiotis-Batoson, Doug Whalen and two anonymousreviewers. Our thanks to Mark Tiede for help with manuscriptpreparation, and Zefang Wang for help with the graphics. Thiswork was supported by NSF grant BNS8520709 and NIHgrants HD-01994 and NS-13617 to Haskins Laboratories.

HI

regularities can be captured by representingconstellations of gestures (each having inherentduration) using gestural scores (§ 1.3). Finally, weshow how the proposed gestural structures relateto proposals of feature geometry (§§ 2 - 3).

1.1 Geshves as pre-linguistic primitivesGestures are units of action that can be

identified by observing the coordinatedmovements of vocal tract articulators. That is,repeated observations of the production of a givenutterance will reveal a characteristic pattern ofconstrictions being formed and released. The factthat these patterns of (discrete) gestures aresimilar in structure to the nonlinear phonologicalrepresentations being currently postulated (e.g .

Clements, 1985; Hayes, 11)86; Sagey, 1986),together with some of the evidence presented inBrowman and Goldstein (1986, in press), leads usto make the strong hypothesis that gesturesthemselves constitute basic phonological units.This hypothesis has the atti..ctive feature that thebasic units of phonology can be identified directlywith cohesive patterns of movement within thevocal tract. Thus, the phonological system is builtout of inherently discrete units of action. Thisstate of affairq would be particularly useful for achild learning to speak. If we assume that discretegestures (like those that will eventually functionas phonological units) emerge in the child'sbehavioral repertoire in advance of anyspecifically linguistic development, then it ispossible to view phonological development asharnessing these action units to be the basic unitsof phonological stru,...ures.

The idea that pre-linguistic gestures areemployed in the service of producing early wordshas been proposed and supported by a number ofwriters, for example, Fry (1966, in Vihman inpress), Locke (1983), Studdert-Kennedy (1987)and Vihman (in press), where what we identify as'gestures' are referred to as 'articulatory routines'or the like. The view we are proposing extends

4 /

Page 79: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

70 Brounnan and Goldstein

this approach by hypothesizing that these pre-linguistic gestures actually become the units ofcontrast. Additional phonological developmentsinvolve differentiating and tuning the gestures,and developing patterns of intergesturalcoordination that correspond to largerphonological structures.

The evidence that gestures are pre-linguisticunits of action can be seen in the babblingbehavior of young infants. The descriptions ofinfant babbling (ages 6-12 months) suggest apredominance of what are transcribed as simpleCV syllables (Locke, 1986; Oiler & Eilers, 1982).The 'consonantal' part of these productions can beanalyzed as simple, gross, constriction maneuversof the independent vocal tract subsystems and(within the oral subsystem) the separate oralarticulator sets. For example, based on frequencycounts obtained from a number of studies, Locke(1983) finds that the consonants in (1) constitutethe 'core' babbling inventory: these 12 'consonants'account for about 95% of the babbles of Englishbabies. Similar frequencies obtain in otherlanguage environments.

(1) h bdg ?tk mn jw s

These transcriptions are not meant to be eithersystematic phonological representations (the childdoesn't have a phonology yet), or narrow phonetictranscriptions (the child cannot be producing thedetailed units of its 'target' language, because, asnoted below, there do not seem to be systematicdifferences in the babbles produced by :nfants indifferent language environments). Others havenoted the problems inherent in using atranscription that assmes a system of units andrelations to describe a behavior that lacks such asystem (e.g., Kent & Murray, 1982; Koopmans-vanBeinum & van der Stelt, 1986; Oiler, 1986;Studdert-Kennedy, 1987). As Studdert-Kennedy(1987) argues, it seems likely that thesetranscriptions reflect the production by the infantof simple vocal constriction gestures, of the kindthat evolve into mature phonological structures(which is why adults can transcribe them usingtheir phonological categories). Thus, /h/ can beinterpreted as a laryngeal widening gesture and/bdg/ as 'gross' constriction gestures of the threeindependent oral articulator sets (lips, tongue tip,and tongue body). /ptk/ combine the oralconstriction gestures with the laryngealmaneuver, and /m n/ combine oral constrictionswith velic lowering. These combinations do notnecessarily indicate an ability on the part of theinfant to coordinate the gestures. Rather, any

accidental temporal coincidence of two suchgestures would be perceived by the listener as thesegments in question.

The analysis outlined above suggests thatbabbling involves the emergence, in the infant, ofsimple constriction gestures of independent partsof the vocal tract. As argued by Locke :1986), the-pattern of emergence of these actions can beviewzd as a function of anatomical andneurophysiological developments, rather than thebeginning of language acquisition, per se. This canbe seen, first of all, in the fact that the babblinginventory and its developmental sequence havenot been shown to vary as a function of theparticular language environment in which thechild finds itself (although individual infants mayvary considerably from one another in the relativefrequencies of particular gesturesStuddert-Kennedy 1987; Vihman, in press). In fact, in thelarge number of studies reviewed by Locke (1983),there appear to be no detectable differences(either instrumentally or perceptually) in the'consonantal' babbling of infants reared indifferent language environments. (More recentstudies have found some language environmenteffect on the overall long term spectrum of vocalicutterancesde Boysson-Berdies et al., 1986; andon prosodyde Boysson-Bardies et al., 1984.Other subtle effects may be uncovered withimprovement of analytic techniques.)

Secondly, Locke (1983) notes that thedevelopmental changes in frequency of particularbabbled consonants can likely be explained b:,anatomical developments. Most of the consonantsproduced by very young infants (less than sixmonths) involve tongue body constrictions, usuallytranscribed as velars. Some time shortly after thebeginning of repetitive canonical babbling (usuallyin the seventh month), tongue tip and lipconstrictions begin to outnumber tongue bodyconstrictions, with tongue tip constrictionseventually dominating. Even deaf infants show aprogression that is qualitatively similar, at leastin early stages, although their babbling can bedistinguished from that of hearing infants on anumber of acoustic measures (Oiler & Eilers,1968). Locke suggests an explanation in terms ofvocal tract maturation. At birth, the infant's.arynx is high, and the tongue virtually fills theoral cavity (Lieberman, 1984). This would accountfor the early dominance of tongue bodyconstricti After the larynx drops, tongue tipand lip Lu,,..rictionswithout simultaneoustongue constrictionsare more readily formed. Inparticular, the closing action of the mandible will

Page 80: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Articulatory Gestures as Phonological Units 71

then contribute to constrictions at the front of themouth.

Finally, Locke (1986) notes that the timing ofthe development of repetitive 'syllabic' babblingcoincides with the emergence of repetitive motorbehaviors generally. He cites Thelen's (1981)observation of 47 different rhythmic activities thathave their peak frequency at 6-7 months. Lockeconcludes (1986, p. 145) that 'it thus appears thatthe syllabic patterning of babblelike thephonetic patterning of its segmentsisdetermined mostly by nonlinguistic developmentsof vocal tract anatomy and neurophysiology.'

The pre-linguistic vocal gestures becomelinguistically significant when the child begins toproduce its first few words. The child seems tonotice the similarity of its babbled patterns to thespeech s/he hears (Locke 1986), and begins toproduce 'words' (with apparent referentialmeaning) using the available set of vocal gestures.It is possible to establish that there is a definiterelationship between the (nonlinguistic) gesturesof babbling and the gestures employed in earlywords by examining individual differences amongchildren. Vihman et al. (1985) and Vihman (inpress) find that the particular consonants thatwere produced with high frequency in thebabbling of a given child also appear with highfrequency in that child's early word productions..hus, the child is recruiting its well-practiced

action units for a new task. In fact, in some earlycases (e.g., 'baby' words like mama, etc.),'recruiting' is too active a notion. Rather, parentsare helping the child establish a referentialfunction with sequences that already exist as partof the babbling repertoire (Locke, 1986).

Once the child begins producing words (complexunits that have to be distinguished one fromanother) using the available gestures as buildingblocks, phonology has begun to form. If wecompare the child's early productions (using thesmall set of pre-linguistic gestures) to the gesturalstructure of the adult forms, it is clear that thereare (at least) two important developments that arerequired to get from one to the other: (1)differentiation and tuning of individual gesturesand (2) coordination of the individual gesturesbelonging to a given word. Let us examine these inturn.

Differentiation and tuning. While therepertoire of gestures inherent in the consonantsof (1) above employs all of the relativelyindependent articulator sets, Lie babbled gesturesinvolve just a single (presumably gross)movement. For example, some kind of closure is

involved for oral constriction gestures. In general,however, languages employ gestures producedwith a given articulator set that contrast in thedegree of constriction. That is, not only are closuregestures produced but also fricative and wider(approximant) gestures. In addition, the exactlocation of the constriction formed by a givenarticulator set may contrast, e.g., in the gesturesfor /0/, /s/ and /P. Thus, a single pre-linguisticconstriction gesture must eventually differentiateinto a variety of potentially contrastive gestures,tuned with different values of constriction locationand degree. Although the location and degree ofthe constriction formed by a given articulator setare, in principle, physical continua, thedifferentiated gestures can be categoricallydistinct. The partitioning of these continua intodiscrete categories is likely aided by quantal (i.e.,nonlinear) articulatory-auditory relations of thekind proposed by Stevens (1972, 1989). Inaddition, Lindblom (1986) has shown how thepressures to keep contrasting words perceptuallydistinct can lead to discrete clustering along somearticulatory/acoustic contina. Even so, the processof differentiation may be lengthy. Nittrouer,Studdert-Kennedy, and McGowan (1989) presentdata on fricative production in American Englishchildren. They find that differentiation between /s/and 4/ is increasing in children from ages three toseven, and hasn't yet reached the level shown byadults. In addition, tuning may occur even wheredifferentiation is unnecessary. That is, w.en if alanguage has only a single tongue tip closuregesture, its constriction location may be tuned to aparticular language-specific value. For example,English stops have an alveolar constrict onlocation, while French stops are more typiciv'ydental.

Coordination. The various gestures thatconstitute the atoms of a given word in,..st beorganized appropriately. There is some svidencethat a child can know what all the relevantgestures are for some particular word, and canproduce them all, without either knowing or beingable to produce the appropriate organization.StuddertKennedy (1987) presents an 'xample ofthis kind from Ferguson and Farwell (1975). Theylist ten attempts by a 15-month old girl to say theword pen in a half-hour session, as shown in (2):

(2) [m4a, 'A, dedn , hm, may, phin, thahatha, bah,4haun, bus]

While these attempts appear radically different,they can be analyzed, for the most part, as the set

71D

Page 81: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

72 Brozman and Goldstein

of gestures that constitute pen misarranged "ivarious ways: glottal opening, bilabial closui ;,tongue body lowering, alveolar closure and velumlowering. Eventually, the 'right' organization is hitupon by the child. The search is presumably aidedby the fact that the coordinated structureembodied in the target language is one of arelatively small number of dynamically stablepatterns (see also Boucher, 1988). The formationof such stable patterns may ultimately beilluminated by research into stable modes incoordinated human action in general (e.g., Hakenet al., 1985; Schmidt et a1.,1987) being conductedwithin the broad context of the non-lineardynamics relevant to problems of patternformation in physics and biology (e.g., Glass &Mackey, 1988; Thompson & Stewart, 1976). Inaddition, aspects of coordination may emerge asthe result of keeping a growing number of wordsperceptually distinct using limited articulatoryresources (Lindblom et al., 1983).

If phonological structures are assumed to beorganized patterns of gestural units, a distinctmethodological bonus obtains: the vocal behaviorof infants, even pre-linguistic behavior, can bedescribed using the same primitives (discreteunits of vocal action) that are used to describe thefully elaborated phonological system of adults.This allows the growth of phonological form to beprecisely monitored, by observing the developmentof the primitive gestural structures of infants intothe elaborated structures of adults (Best &Wilkenfeld, 1988 provide an example of this). Inaddition, some of the thorny problems associatedwith the transcription of babbling can be obviated.Discrete units of action are present in the infant,and can be so represented, even if adult-likephonological structures have not yet developed. Asimilar advantage applies to describing variouskinds of 'disoroered' speech (e.g., Kent, 1983;Marshall et al., 1988), which may lack theorganization shown in 'normal' adult phonology,making conventional phonological/phonetictranscriptions inappropriate, but which may,nevertheless, be composed of gestural primitives.Of course, all this assumes that it is possible togive an account of adult phonology using gesturesas the basic unitsit is to that account that wenow turn.

1.2 The nature of phonological gesturesIn conjunction with our colleagues Elliot

Saltzman and Philip Rubin at HaskinsLaboratories, we are developing a computationalmodel that produces speech beginn4ng with a

representation of phonological structures in termsof gestures (Brownian et al., 1984; Brownian et al.,1986; Browman & Goldstein, 1987; Saltzman etal., 1987), where a gesture is an abstractcharacterization of coordinated task-directedmovements of articulators within the vocal tract.Each gesture is precisely defined in terms of theparameters of a set of equations for a 'task-dynamic' model (Saltzman, 1986; Saltzman &Kelso, 1987). When the control regime for a givengesture is active, the equations regulate thecoordination of the model's articulators in such away that the gestural 'task' (the formation of aspecified constriction) is reached as the articulatormotions unfold over time. Acoustic output isobtained from these articulator motions by meansof an articulatory synthesizer (Rubin et al., 1981).The gestures for a given utterance are themselesorganized into a larger coordinated structure, orconstellation, that is represented in a gesturalscore (discussed in $ 1.3). The score specifies thesets of values of the dynamic parameters for eachgesture, and the temporal intervals during whicheach gesture is active. While we use analyses ofarticulatory movement data to determine theparameter values for the gestures and gesturalscores, there is nevertheless a saikingconvergence between the structures we derivethrough these analyses and phonologicalstructures currently being proposed in otherframeworks (e.g., Anderson & Ewen, 1987;Clements, 1985; Ewen, 1982; Lass, 1984;McCarthy, 1989; Plotkin, 1976; Sage. 1986).

Within task dynamics, the goal for a givengesture is specified in terms of independent taskdimensions, called vocal tract variables. Eachtract variable is associated with the specific sets ofarticulators whose movements determine thevalue of that variable. For example, one such tractvariable is Lip Aperture (LA), corresponding tothe vertical distance between the two lips. Threearticulators can contribute to changing LA: thejaw, vertical displacement of the lower lip withrespect to the jaw, and vertical displacement ofthe upper lip. The current set of tract variables inthe computational model, and their associatedarticulators, can be seen in Figure 1. Within thetask dynamic model, the control regime for a givengesture coordinates the ongoing movements ofthese articulators in a flexible, but task-specificmanner, according to the demands of otherconcurrently active gestures. The motionassociated with each of a gesture's tract variablesis specified in terms of an equation for a second-order dynamical system.' The equilibrium

& o

Page 82: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Articulatory Gestures as Phonological Units 73

position parameter of the equation (x0] specifiesthe tract variable target that will be achieved, andthe stiffness parameter k specifies (roughly) thetime required to get to target. These parametersare tuned differently for different gestures. Inaddition, their values can be modified by stress.

Gestures are currently specified in terms of oneor two tract variables. Velic gestures involve asingle tract variable of aperture size, as do glottalgestures. Oral gestures involve pairs of tractvariables that specify the constriction degree (LA,TTCD, and TBCD) and constriction location (LP,TTCL, and TBCL). For simplicity, we will refer tothe sets of articulators involved in oral gesturesusing the name of the end-effector, that is, thename of the single articulator at the end of thechain of articulators forming the particularconstriction: the LIPS for LA and LP, the tonguetip (TT) for gestures involving TTCD and TTCL,and the tongue body (TB) for gestures involvingTBCD and TBCL. As noted above, each tract

tract variable

variable is modelled using a separate dynamicalequation; however, at present the paired tractvariables use identical stiffness and are activatedand de-activated simultaneously. The dampingparameter b for oral gestures is always set forcrit;cal dampingthe gestures approach theirtargets, but do not overshoot it, or 'ring.' Thus, agiven oral gesture is specified by the values ofthree parameters: target values for each of a pairof tract variables, and a stiffness value (used forboth equations).

This set of tract variables is not yet complete, ofcourse. Other oral tract variables that need to beimplemented include an independent tongue rootvariable (Ladefoged & Halle, 1988), and (asdiscussed in $ 2.1) variables for con ,rolling theshape of TT and TB constrictions as seen in thethird dimension. Additions! laryngeal variablesare required to allow for pitch control and forvertical movement of the larynx, required, forexample, for ejectives and implosives.

articulators involved

LPLATTCLTTCDTBCLTBCDVELGLO

lip protrusionlip aperturetongue tip constrict locationtongue tip constrict degreetongue body constrict locationtongue body constrict degreevelic apertureglottal aperture

TBCL

/VELil

\ GLO

TTCD

upper & lower lips, jawupper & lower lips, jawtongue tip, body, jawtongue tip, body, jawtongue body, jawtongue body, jawvelumglottis

:.1.'CL

i LA . LP ..-*

Figure 1. Tract variables and contributing articulators of computational model.

a1

Page 83: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

74 Eltowman and Goldstein

The representations of gestures employingdistinct sets of tract variables are categoricallydistinct within the system outlined here. That is,they are defined using different variables thatcorrespond to different sets of articulators. Theyprovide, therefore, an inherent basis for contrastamong gestures (Brownian & Goldstein, in press).However, for contrasting gestures that employ thesame tract variables, the difference between thegestures is in the tuned values of the continuousdynamic parameters (for oral gestures:constriction degree, location and stiffness). Thatis, unlike the articulator sets being used, thedynamic parameters do not inherently definecategorically distinct classes. Nonetheless, weassume that there are stable ranges of parametervalues that tend to contrast with one anotherrepeatedly in languages (Ladefoged & Maddieson,1986; Vatikiotis-Bateson, 1988). The discretevalues might be derived using a combination ofarticulatory and auditory constraints appliedacross the entire lexicon of a language, asproposed by Lindblom et el. (1983). In addition,part of the basis for the differen+ ranges mightreside in the nonlinear relation between theparameter values and their acoustic consequences,as in Stevens' quantal theory (Stevens 1972,1989),In order to represent the contrastive rangesof gestural parameter values in a discrete fashion,we employ a set of gestural descriptors. Thesedescriptors serve as pointers to the particulararticulator set involved in a given gesture, and tothe numerical values of the dynamical parameterscharacterizing the gestures. In addition, they canact as classificatory and distinctive features forthe purposes of lexical and phonological structure.Every gesture can be specified by a distinctdescriptor structure. This tinictional organizationcan be formally represented as in (3), whichrelates the parameters of the dynamical equationsto the symbolic descriptors. Contrasting gestureswill differ in at least one of these descriptors.

(3) Gesture = articulator set (constrictiondegree. constriction location, constrictionshape, stiffness)Constriction Degree is always present, andrefers to the x0 value for the constrictiondegree tract variables (LA, TTCD, TBCD,VEL, or GLO).

Constriction Location is relevant only fororal gestures, and refers to the x0 value forthe constriction location tract variables (LP,TTCL, or TBCL).

Constriction Shape is relevant only for oralgestures, and refers to the x0 value ofconstriction shape tract variables. It is notcurrently implemented.

Stiffness refers to the k value of the tractvariables.

Figure 2 displays the inventory of articulatorsets and associated parameters that we posit arerequired for P. general gestural phonology. Theparameters correspond to the particular tractvariables of the model shown below them. Thoseparameters with asterisks are not currentlyimplemented.

Gestures:Artlirtor Dimension

LIPS

TT (con degree, con location, con shape , stilhoss)

TTCD TTCL

(con degree, con location, , StiffileS4

LA LP

TB

TR'

(con degree, con location, con shape , stiffness)

TBCD TBCL

(con degree', cop location', . sb/fress)

VEL (con degree,

VEL, saltness)

GLO (con dogma con location', , stilfress)GLO

Figure 2. Inventory of articulator sets and associatedparameters.

For the present, we list without comment thepossible descriptor values for the constrictiondegree (CD) and constriction location (CL)dimensions in (4). In § 2.1, we will discuss thegestural dimensions and these descriptors indetail, including a comparison to currentproposals of feature! geometry.

&2

Page 84: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Articulatory Gat'. s as Phonological Units 75

(4) CD descripto:s: closed critical narrow midwide

CL descriptors: protruded labial dentalalveolar post-alveolarpalatal velar uvularpharyngeal

In le phonology of dynamically definedarticulatory gest- -es that we are developing,gestures are posited to b' the atoms ofphonological structure. It is important to note thsuch gestures are relatively abstract. That is, thephysically continuous movement trajectories areanalyzed as resulting from a set of discrete,concurrently active ge tural control regimes. Theyare discrete in two senses: (1) the dynamicparameters o. a gesture's control regime remainconstant throughout the discrete interval of timeduring which the gesture is active, and (2)gestures in a language may differ from oneanother in discrete ways, as represented bydifferent descriptor values. Thus, as argued inBrownian and Goldstein (1986) and Browman andGoldstein (in press), the gestures for a givenutterance, together with their temporalpatterning, perform a dual function. Theycharacterize the actual observed articulatormovements (thus obviating the need for anyadditional implementation rules), and they alsofunction as units of contrast (and more generallycapture aspects of phonological patterning). Asdiscus.ed in those papers, the gesture as aphonological unit differs both from the feature andfrom the segment (or root node in current featuregeometries). It is a larger unit than the feature,being effectively a unitary constriction action,parameterize,' jointly by a linked structure offeatures (descriptor values). Yet it is a smallerunit than the segment: several gestures linkedtogether are necessary to form a unit at thesegmental, or higher, levels.

1.3 Gestural scores: Art...ulatory tiers andinternal duration

In the preceding section, gestures were definedwith reference to a dynamical system that shapespatterns of articulatory movamenAs. Each gesturepossesses, therefore, not only ar, inherent spatialaspect (i.e., a tract variable goal) but also anintrinsic temporal Raper' (i.e., a gesturalstiffness). Much of ; power of the gesturalapproach follows n om these basic facts aboutgestures (combined with their abstractness), sincethey allow gestv to overlap in time as well as inarticulator and/or tract variable space (see also

Bell-Berti & Harris, 1981; Fowler, 1980, 1983;Fujimlira, 198 la,b). In this section, we show howoverlap among gestures is represented, anddemonstrate that si.nple changes in the patternsof overlap between neighboring gestural units canautomatically produce a variety of superficiallydifferent types of pl-.:"%.+-a. and phonologies'variation.

Within the computational modal describedabove, the pastern of organize.ion, orconstellation, of gestures -orresponding to a givenutterance is embodied in a set of phasingprinciples (sue Kelso & Tuner, 1987; Nittrouer etal., 1988) that specify the spatiotemporalcoordination of Laie gestures (Browman &Goldstein, 198v 11-e pattern of intergesturalcoordination that results from applying thephasing principles, along with the interval ofactive control for individual gestures, is displayedin a two-dimensional gestural score, witharticulatory tiers on one dimension and temporalinformation on the other (Browman et al., 1986;Browman & Goldstein, 1987). A gestural score forthe word palm (pronounced [pom]) is dkr,:layed inFigure 3a. As can be set : n the figure, the tiers ina gestural score, on the vertical axis, represent thesets of articulators (or the relevant subset thereof)employed by the gestures, while the horizontaldimension codes time.

The boxes in Figure 3a correspond to individualges.....res, labelled by their descriptor values forconst iction degree and constriction location(where relevant). For example, the initial oralgesture is a bilabial closure, represented as aconstriction of the LIPS, with a [closed]constriction degree, and a [labial] constrictionlocation. The horizontal extent of each boxrepreseats the interval of time during which thatparticular gesture is active. During theseactivation intei 7als, which are determined in thecomputat.onal model from the phasing principlesand the inherent stiffnecses of each gesture, theparticular set of dynamic parameter values thatdefines each gesture is actively contrit sting toshaping the movements of the model articulators.In Figure 3b, curves cm added that show thetime-varying tract variable trajectories generatedby the task dynamic model according to theparameters indicated by the boxes. For example,during the activation interval of the initial labialclosure, the curve renresenting LA (the verticaldistance between iips) decreases. As can beseen in these curves, the activation intervalsdirectly capture something about the durations ofthe movements of the gestures.

C3

Page 85: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

dolabiallabial

76 Brozman and Goldstein

(a)

(b)

VEL

TB

LIPS

GLO

VEL

TR(T,CD)

LIPS(IA)

GLO

...Or.

dolabial

wide

wide

narrowpharyngeal

Is aINNVIMPIC

clo

phary.igeal

clo

narrow

1111!1=1,

1W

b

200 300

TIME (MS)400

a

INIMM=1110

Figure 3. Gestural scor for palm Iparni using box notation. (a) Activation intervals only; (6) model generated tractvariable motions added.

&44

Page 86: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Articulatory Gestures as Phonological Units 77

Figure 4 presents an alternative symbolicredisplay2 of the gestural score. Instead of thebox' notation of Figure 3, a 'point' notation isemployed that references only the gestura'.escriptors and relative timing of their 'targets.'The extent of activation of the gestures in the boxnotation is not indicated. In addition, associationlines between the gestures have been added.These lines indicate which gestures are phasedwith respect to each other. Thus, the display is a\orthand representation of the phasing rules

discUssed in Browman and Goldstein (1987).The pair of gestures connected by a givenassociation line are coordinated so that a specifiedphase of one gesture is synchronized with somephase of the other. For example, the peak openingof the GLO [wide] gesture (180 degrees issynchronized with the release phase (290 degrees)of the LIPS [clo labial] gesture. Also important forthe phace rules is the projection of oralconstriction gestures onto separate Vowel andConsonant tiers, which are not shown here(Browman & Goldstein, 1987, 1988; see alsoKeating, 1985).

The ',se of point notation highlights theassociation lines, and, as we shall see in § 2.2.2, isuseful for the purpose of comparing gesturaloreanizations with feature geometryrepresentations in which individual units lack anyextent in time. For the remainder of this section,however, we will be concerned with showing theextent of temporal overlap of gestures, andthere 're will employ the box notation form of thegest= score.

The information represented in the gesturalscore serves to identify a particular lexical entry.The basic elements in such a lexical unit are the

VEL

TB

LIPS

GLO

gestures, which, as we have already seen, cancontrast with one another by means of differingdescriptor values. In addition, gestural scores fordifferent lexical items can contrast in terms of thepresence vs. absence of particular gestures(Browman & Goldstein, 1986, in press; Goldstein& Browman, 1986). Notice that one implication oftaking ge ':ures as basic units is that the resultinglexical representation is inherently under-specified, that is, it contains no specifications forirrelevant features. When a given articulator isnot involved in any specified gesture, it isattracted to a 'neutral' position specific to thatarticulator (Saltzman et al., 1988; Saltzman &Munhall, 1989).

The fact that each gesture has an extent in time,and therefore can overlap with other gestures, hasa variety of phonological and phoneticconsequences. Overlap between invariantlyspecified gestures can automatically generatecontextual variation of superficially differentsorts: (1) acoustic noninvariance, such ; thedifferent formant transitions that result when aninvariant consonant gesture overlaps differentvowel gestures (Liberman & Mattingly, 1985); (2)allophonic variation, such as the nasalized vowelthat is produced by overlap between a syllable-final velic opening gesture and the vowel gesture(Krakow, 1989); and (3) various kinds of'coarticulation,' such as the context-dependentvocal tract shapes for reduced schwa vowels thatresult from overlap by the neigf'Aring full vowels

mn & Goldstein, 1989; Fowler, 1981). Here,however, we will focus on the implication ofdirectly represer "ng overlap among phonologicalunits for the phonological/phonetic alternations influent speech.

L do

[wide]

(wide)

narrowharyngsa\E

dolabial

'pand [porn]

Figure 4. Gestural score for palm (Pam) using point notation, with association lines added.

C5

Page 87: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

78 Brownian and Goldstein

Browman and Goldstein (1987) have proposedthat the wide variety of differences that have beenobserved between the canonical pronunciation of aword and its pronunciation in fluent contexts (e.g.,Brown, 1977; Shockey, 1974) all result from twosimph. kinds of changes to the gestural store: (1)reduction in the magnitude of individual gestures(in both time and space) and (2) increase inoverlap among gestures. That paper showed howthese two very general processes might accountfor variations that have traditionally beendescribed as segment deletions, insertions,assimilations and weakenings. The reason thatincreased overlap, in particular, can account forsuch differ -nt types of alternations is that thearticulatory and acoustic consequences ofincreased overlap will vary depending on thenature of the overlapping gestures. We willillusti ate this using some of the examples ofdeletion and assimilation presented in Browmanand Goldstein (1987), and then compare thegestural account with the treatment of suchexamples in non-lir.ear phonological theories thatdo not directly represent overlap amongphonological units.

Let us examine what happens when a gesturP1score is varied by increasing the overlap betweentwo oral constriction gestures, for example fromno overlap to complete synchrony. This 'sliding'will produce different consequences in thearticulatory and acoustic output of the model,depending on whether the gestures are on thesame or different articulatory tiers, i.e., whetherthey employ the same or different tract variables.If the gestures are on different articulatory tiers,as in the case of a LIPS cicsure and a tongue tip(TT) closure, then the resulting tract variablemotion for each gesture will be unaffected by theother concurrent gesture. Their tract variablegoals will be met, regardless of the amount ofoverlap. However, with sufficient overlap, onegesture may completely obscure the otheracoustically, rendering it inaudible. We refer ti.this as gestural 'hiding.' In contrast, when twogestures are on the same articulatory tier, forexample, the tongue tip constriction gesturesassociated with It/ and /0/, they cannot overlapwithout perturbing each others' tract variablemotions. The two gestures are in competitionthey are attempting to do different tasks with theidentical articulatory structures. In this case, thedynamical parameters for the two overlappinggestural control regimes are 'blended' (Saltzmanet al., 1988; Saltzman & Munhall, in press).

Browman and Goldstein (1987) presentedexamples of articulations in fluent speech thatshowed the hiding and blending beha iorpredicted by the model. Examples of hiding aretranscribed in (5).

(5) (a) /porfokt 'memorri/ --) [porfokmemori_(b) /sevn'pls/ --) [seven pins]

In (5a), careful listening to a speaker's productionof perfect memory produced in a fluent sentencecontext failed to reveal any audible N at the endofperfect, although the /t/ was audible when thetwo words were produced as separate phrases in aword list. This deletion of the final /t/ is anexample of a general (variable) process in Englishthat deletes final /t/ or /d/ in clusters, particularlybefore initial obstruents (Guy, 1980). However,the articulatory data fol the speaker examined inBrowman and Goldstein, (1987) showed thatnothin was actually deleted from a gesturalviewpoint. The abfeolar closure gesture at the endof 'perfect' was produced in the fluent context,with much the same magnitude as when the twowords were produced in isolation, but it wascompletely overlapped by the constrictions of thepreceding velar closure and the following labialclosure. Thus, the alveolar closure gesture wasacoustically hidden. This increase in overlap isrepresented in Figure 5, which shows the (partial)gestural scores posited fcr the two versions of thisutterance, based on the observed articulatorymovements. The gestures for the first syllable o°memory (shown as shaded boxes) are wellseparated from the gestures for the last syllable ofperfect (shown as unshaded boxes) in the word listversion in Figure 5a, but they slide earlier in timeas shown in Figure 5b, producing substantialoverlap among three closure gestures. Note thatthese three overlapping ge"ur es are all onseparate tiers.

In (5b), seven plus shows an apparentassimilation, rather than deletion, and wasproduced when the phrase was produced at a fastrate. Assimilation of final alveolar stops andnasals to a following labial (or velar) stop is acommon connected speech process in English(Brown, 1977; Gimson, 1962). Here again,however, the articulatory data in Brownian andGoldstein (1987) showed that the actual changewas 'hiding' due to increased overlap: the alveolarclosure gesture at the end of seven was stillproduced by the speaker in the 'assimilated'version, but it was hidden by the preceding labial fricative and fo.lowing labial stop.

C6

Page 88: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Articulatory Gestures as Phonological Units 79

TB

(a) TT

LIPS

(b) TT

LIPS

'perfect memory'

dovelar

nitdental

.P.4"7"-1

f * k t

alldental

m e m]

dovelar

alder

.:,`":1r."..7.77

doalveolar

(f *k m a m 1

Figure 5. Partial gestural score for two versions of perfect me.mory, posited from observed articulator movements. Lastsyllable of perfect shown in unshaded boxes; first syllable of memory shown in shaded box* 4. Only oral tiers areshown. (a) Words spoken in word list; (b) words spoken as part of fluent phrase.

The changes in the posited gestural sccre can beseel in Figure 6. Because the velum loweringgesture (VEL [wide]) at the end of seven in Figure6b overlaps the labial closure, the hidingisperceived as assimilation rather than deletion. Evidence for such hid-den gestures (in somecases having reduced magnitude) has also beenprovided by a number of electropalatographic studies, where they have been taken asevidence of 'partial assimilation' (Barry, 1985;Hardcastle & Roach, 19'7d; Kohler, 1976 (forGerman)). Thus, from a gestural point of view,deletion (5a) and assimilation (5b) of finalalveolars may in% olve exactly the same

processincrease of overlap resulting in a hiddengesture.

When overlap is increased between two gestureson the same articulatory tier, rather than ondifferent articulatory tiers as in the aboveexamples, the increased overlap results inblending between the dynamical parameters ofthe two gestures rather than hiding. Thetrajechries of the tract variables shared by thetwo gestures are affected by the differing amountsof overlap. Evidence for such blending can be seenin examples like (6).

(6) /ten Ohm/ ) [t.eileimz]

Page 89: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

80 Browman and Goldstein

VEL

TB

Ti

LIPS

GLO

VEL

TB

LIPS

GLO

' seven plus '

Figure 6. Partial gestural score for two versions of seven plus, kosited from observed articulator movements. Lastsrlable of seven shown in unshaded boxes; plus shown in shaded boxes. The starred !alveolar do] gesture indicatesthat laterally is not represented in these scores. (a) Spoken at slow rate; (6) spri:en at fast rate.

The apparent assimilation in this case, as well asin many other cases that involve conflictingrequircments for the same tract variables, hasbeen characterized by Catford (1977) as involvingan accommodation between the two units. Thiskind of accommodation is exactly what ispredicted by parameter blending, assuming thatthe same under:ying mechanism of increasedgestural overlap occurs here as in . he examples in(5). The (partial) gestura: scores hypothesized for(6), showng the increase in overlap, are displayed

Figure 7. The hypothesis of gestural overlap.and consequent blending, makes an specificprediction: the observed motion of the Tr tract

variables resulting from overlap and blendingshould differ from the motion exhibited by eitherof individual gestures alone. In particular, thelocation of the constriction should not be identicalto that of either an alveolar or a dental, but rathershould fall somewhere in between. If thisprediction is confirmed, then three (superficially)different fluent speech processesdeletikin of finalalveolar stops in clusters, assimilatim of finalalveolar stops and nasals to following labials andvelars, and assimilation of final alveolar stops toother tongue tip consonantscan all be accountedfor as the consequence of increasing overlapbetween gestures in fluent speech.

8s

Page 90: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Articulawry Gestures as Phonological Units 81

TB

(a) TT

LIPS

TB

(b) TT

LIPS

'ten themes'

acolilvear

veldpalatal

eloalveolar

,Oettenvpandat

likaat

a n

widepalatal

oloalveolar

m z

E catItIVOOlar

.ete.*Vat

I t a e i m z

Figure 7. Hypothesized gestured score for two versions of tea themes. ten shown in unshaded boxes; themes shown inshaded boxes. Only oral tiers are shown. (a) Spoken with pause between words; (b) spoken as part of fluent phrase.

How does the gestural analysis of these fluentspeech alternations compare with analysesproposed by other theories of non-linearphonology? Assimilations such as those in (6) havebeen analyzed by Clements (1985) as resultingfrom a rule that operates on sequences of alveolarstops or nasals followed by (+corona]] consonants.The rule, whose effect is shown in (7), delinks theplace node of the first segment and associates theplace node of the ...cond segment to the first (byspreading).

(7) Manner tier (- cont] [+ cons)

Supralaryngeal tier

Place tier cor [+ cor ]- ant

Since the delinked features are assumed to bedeleted, by convention, and not realizedphonetically, the analysis in (7) predicts that theplace of articulation of the assimilated sequenceshould be indistinguishable from that of thesecond consonant when produced alo.,e. Thisclaim differs from that made by the blendinganalysis, which predicts that the assimilatedsequence should show the influence of bothconsonants. These conflicting predictions can bedirectly tested.

The overlap analysis accounts for a wider rangeof phenomena than just the assimilations in (6).The deletion of final alveolars (in clusters) ? theassimilation of final alveolars to following stops in(5) were also shown to be cases of hiding due toincreased gestural overlap. Clement's analysisdoes not handle these additional cases, and cannotbe extended to do so without major reinter-pretation of autosegmental formalisms. To see

Page 91: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

82 Browman and Goldstein

that this is the case, suppose Clements' analysis isextended by eliminating the [ +coronal]requirement on the second segment (for fluentspeech). This would produce an assimilation for acase like (5b), but it would not be consistent withthe data showing that the alveolar closure gestureis, in fact, still produced. To interpret a delinkedgesture as one that is articulatorily produced, butauditorily hidden, would require a major changein the assumptiuns of the framework.

Within the framework of Sagey (1986), the casesof hiding in (5) could be handled, but the analysiswould fail to capture the parallelism betweenthese cases and the blending case in (6). InSagey's framework, there are separate class nodesfor each of the independent oral articulators,corresponding to the articulatory tiers. Thus, anassimilation like that in (5b) could be handled asin (8): the labial node of the second segment isspread to the preceding segment's place node,effectively creating a 'complex' segment involvingtwo articulations.

(8) ROOT ROOT

[- mit] cons]

Supralaryngeal Supralaryngeal

e

Coronal :,axial

However, the example in (6) could not be handledin this way. In Sagey's framework, complexsegments can only be created in case there aredifferent articulator nodes, whereas in (6), thesame articulator is involved. Thus, an analysis inSagey's framework will not treat the examples in(5) and (6) as resulting from a single underlyingprocess. If the specific prediction made by theoverlap and blending analysis for cases like (6)proves correct, this would be evidence that aunitary process (overlap) is indeed involved--aunity that is directly captured in the overlappinggesture approach, but not in Sagey's framework.

Finally we note that, in general, the gesturalapproach gives a more constrained andexplanatory account of the casual speech changes.

All changes are hypothesized to result from twosimple mechanisms, which are intrinsicallyrelated to the talker's goals of speed and fluencyreduce the size of individual gestures, andincrease their overlap. The detailed changes thatemerge from these processes are epiphenomenalconsequences of the 'blind' application of theseprinciples, and they are explicitly generated assuch by our model. Moreover, the gesturalapproach makes predictions about the kinds offluent speech variation expected in otherlanguages. Given differences between languagesin the canonical gestural scores for lexical items(due to language-specific phasing principles), thesame casual speech processes are predicted tohave different consequences in differentlanguages. For example, in a language such asGeorgian in which stops in the canonical form arealways released, word-finally and in clusters(Anderson, 1974), the canonical gestural scoreshould show less overlap between neighboringstops than is t'w. case in English. The gesturalmodel predicts that overlap between gestures willincrease in casual speech. But in a language suchas Georgian, an increase of the same magnitudeas in English would not be sufficient to causehiding. Thus, no casual speech assimilations anddeletions would be predicted for such a language,at least when the increase of gestural overlap isthe same magnitude as for English.

The usefulness of internal duration and overlapamong phonological elements has begun to berecognized by phonologists (Hammond, 1988;Sagey, 1988). For example, Sagey (1986, pp. 20-21) has argued that phonological association lines,which serve to link the features on separate tiers,'represent the relation of overlap in time... Thus...the elements that they link...must have internalduration.' However, in nongeetural phonologies,the consequences of such overlap cannot beevaluated without additional principles specifyinghow overlapping units interact. In contrast, in agestural phonology, the nature of the interactionis implicit in the definition of the gesturesthemselves, so dynamical control regimes for a(model) ph) ..cal system. Thus, one of the virtuesof the gestural approach is that the consequencesof overlap are tightly constrained and maceexplicit in terms of a physical model. Explicitpredictions (e.g., about the relation between theconstrictions formed by the tongue tip in themesand in ten themes, as well as about languagedifferences) can be made that test hypothesesabout phonological structures.

90

Page 92: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Articulatory Ca , res as Phonological Units 83

In summary, a gestural phonology characterizesthe movements of the vocal tract articulatorsduring speech in terms of a minimal set of discretegestural units and patterns of spatiotemporalorganization among those units, using dynamicand articulatory models that make explicit thearticulatory and acoustic consequences of aparticular gestural organization (i.e., gesturalscore). We have signed that a number ofphonological properties of utterances .ire inherentin these explicit gestural constellations, and thusdo not require postulation of additionalphonological structure. Distinctiveness (Brownian& Goldstein, 1986, in press) and syllable structure(Brownian & Goldstein, 1988) can both be seen ingestural scores. In addition, a number ofpostlexical phonological alternations (Brownian &Goldstein, 1987) can be better described in termsof gestures and changes in their organization(overlap) than in terms of other kinds ofrepresentations.

2 RELATION BETWEEN GESTURALSTRUCTURES AND FEATURE

GEOMETRYIn the remainder of this paper, we look more

closely at the relation between gestural structuresand recent proposals of feature geometry. Thecomparison shows that there is much overallsimilarity and compatibility between featuregeometry and the geometry of phonologicalgestures (a 2.1). Nevertheless, there are somedifferences. Most importantly, we show that thegesture is a cohesive unit, that is, a coordinatedaction of a set of articulators, moving to achieve aconstriction tha is specified with respect to itsdegree as well as its location. We argue that thegesture, and gestural scores, could usefully beincorporated into feature geometry is 2.2). Thegestural treatment of constriction degree as partof a gestural unit leads, however, to an apparentdisparity with how manner features are currentlyhandled iri feature geometry. We conclude,therefore (a 3) by proposing a hierarchical tubegeometry that resolves this apparent disparity,and that also, we argue, clarifies the nature ofmanner features and how they should be treatedwithin feature geometry, or any phonologicalapproach

2.1 Articulatory geometry and gesturaldescriptors

In this section, the details of the gesturaldescriptor structuresthe distinctive categories ,if

the tract variable parametersare laid out. Manyaspects of these structures are rooted in longtradition. Jespersen (1914), for example,suggested an 'analphabetic' system of specifyingari-iculatory place along the upper tract, thearticulatory organ involved, and the degree ofconstriction. Pike (1943), in a more elaboratedanalphabetic system, included variables foimpressionistic characterization of thearticulatory movements, including 'crests,''troughs,' and 'glides' (movements between crestsand troughs). More recently, various authors (e.g.,Campbell, 1974; Halle, 1982; Sagey, 1986;Venneman & Ladefoged, 1973) have argued thatphonological patterns are often formed on thebasis of the moving articulator used (thearticulator set, in the current system). Thegestural structures described in this paper differfrom these accounts primarily in the explicitnessof the functional organization of the articulatorsinto articulatory gestures, and in the use of auynamic model to characterize the coordinatedmovements among the articulators.

The articulatory explicitness of the gesturalapproach leads to a clear-cut distinction betweenfeatures of input end features of output. That is, afeature such as 'sonority' has very little to do witharticulation, and a great deal to do with acoustics(Ladefoged, 1988a,b). This difference can becaptured by contrasting the input to the speechproduction mechanismthe individual gestures inthe lexical entryand the outputthearticulatory, aerodynamic and acousticconsequences of combining several gest .es indifferent parts of the vocal tract. The gesturaldescriptors, then, characterize the inputmechanismthey are the 'features' of a purelyarticulatory phonology. Most traditional featuresystems, however, represent a conflation ofarticulatory and acoustic properties. In order toavoid confusion with such combined featuresystems, and in order to emphasize that gesturaldescriptors are purely articulatory, we retain thenon-standard terminology developed in thecomputational model for the category names of thegestural descriptor values.

In a 2.1.1, we discuss the descriptorscorresponding to the articulator sets and showhow they are embedded in a hierarchicalarticulatory geometry, comparable to recentproposals of feature geometry. In § 2.1.2-6, weexamit.e the other descriptors in turn,demonstrating that they can be used to definenatural classes, and showing how the differences

Page 93: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

84 Browman and Goldstein

between these descriptors and other featuresystems stem from their strictly articulatoryand/or dynamic status.

2.1.1 Anatomi"al hierarchy and articulatorsets

The gestural descriptors listed in Figure 2 arenot hierarchically organized. That is, eachdescriptor occupies a separate dimensiondescribing the movement of a gesture. However,there is an implicit hierarchy for the sets ofarticulators involved. This can be seen in Figure 8,which redisplays (most of) the inventory ofarticulator sets and associated parameterdimensions from Figure 2, and adds a grouping ofgestures into a hierarchy based on articulatoryindependence. Because the gestures characterizemovements within the vocal tract, they areeffectively organized by the anatomy of the vocaltract. Ti' and TB gestures share the individualarticulat rs of tongue body and jaw, so f hay definea class of Tongue gestures. Similarly, both LIPSand Tongue gestures use the jaw, and arecombined in the class of Oral gestures. Finally,Oral, velic (VEL), and glottal (GLO) constituterelatively independent subsystems that combineto form a description of the overall Vocal Tract.This anatomical hierarchy constitutes, in effect,an articulatory ge.-.;Tnetry.

VocalTract

LO r CLCD 1Lstiffness j

Figure 8. Articulatory geometry tree

The importance of an articulatory geometry hasrepeatedly been noted, for example in the use ofarticulatory subsystems by Abercrombie (1967)and Anderson (1974). More recent'v, it has formedthe basis of a number of proposals of feature

geometry (Clements, 1985; Ladefoged, 1988a,b;Ladefoged & Halle, 1988; McCarthy, 1988; Sagey,1986). The hierarchy of the moving articulators iscentral to all these proposals of featuralorganization, although the evidence discussedmay be phonological patterns rather thanphysiological structure. Leaving aside mannerfeatures (to be discus_ ad in §§ 2.2.1 and 3), thevarious hierarchies differ primarily in theinclusion cf a Supralaryngeal node in the featuregeometries of Clements (1985) and Sagey (1986),and a Tongue node in the articulatory geometrydiagrammed in Figure 8.

There is no Supralaryngeal node included in thearticulatory geometry of Figure 8, because thisgeometry effectively organizes the vocal tractinput. As we will argue in § 3, the Supralaryngealnode is important in characterizing the output ofthe vocal tract. However, using the criterion ofarticulatory independence, we see no anatomicalreason to combine any of the three majorsubsystems into a higher node in the inputhierarchy. Rather than being part of a universalgeometry, further combmaat;rs of the laryngeal,velic, and Oral subsystems into class nodes suchas Supralaryngeal, or Central (Oral) vs.Peripheral (velic and laryngeal), should beinvoked where necessitated by language-particular organization. For example, the central-peripheral distinction v. s argued to exist for TobaBatak (Hayes, 1986).

The Tongue node in Figure 8 is proposed onanatomical grounds, again using the criterion ofarticulatory independence (and relatedness). Thatis, the Ti' articulator set shares two articulatorswith the TB articulator setthe tongue body andthe jaw. Put another way, the tongue is anintegral structure that is clea:ly separate from thelips. There is some evidence for this node inphonological patterns as well. A number ofarticulatioi.3 made with the tongue canr, of beclearly categorized as being made with either Ti'or TB. For example, laterals in Kuman (PapuaNew Guinea) alternate between velars andcoronals (Lynch, 1983, cited in McCarthy, 1988).They are always Tongue articulations, but are notcategorizable as exclusively TT or TB. Rather,they require some reference to both articulatorsets. This is also the case for English laterals,which can alternate between syllable - initialcoronals and syllable- final velars (Ladefoged,i982).

Palatal and palatoalveolar consonants areanother type of articulation that falls between Ti'and TB articulations (Keating, 1988; Ladefoged &

92

Page 94: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Articulatory Gestures as Phonological Units 85

Maddieson, 1986; Recasens, in preparation)Keating (1988) suggests that the intermediate.status of palatalspartly TT and partly TB canbe handled by treating palatal consonants ascomplex segments, represented under both thetongue body and tongue tip/blade nodes. However,without the higher level Tongue node, such arepresentation equates complex segments such aslabio-velars, consisting of articulations of twoseparate articulators (lips and tongue), withpalatals, arguably a single articulation of a singlepredorsal region of the tongue (Recasens, inpreparation). With the inclusion of the Tonguenode, labio-velars and palatals would be similar inbeing double Oral articulations, but different inbeing LIPS plus Tongue articulations vs. twoTongue articulations. Thus, the closeness of thearticulations is reflected in the level of the nodes.

This evidenc.., is suggestive, but not conclusive,on the role of the Tongue node in phonologicalpatterns. We predict that more evidence ofphonological patterns based on the anatomicalinterdependence of the parts of the tongue shouldexist. One type of evidence should result from thefact that one portion of the tongue cannot movecompletely independently of the other portions.This lack of independence can be seen, forexample, in the suggestion that the tongue bodymay have a characteristically more backed shapefor consonants using the tip/blade in dentals asopposed to alveolars (Ladefoged & Maddiaon1986; Stevens et al., 1986). We would expect tofind blocking rules based on such considerations.

2.1.2 Constriction degree (CD)CD is the analog within the gestural approach of

the manner feature(s). However, it is crucial tonote that, unlike the manner classes in featuregeometry, CD is first and foremost an attribute ofthe gesturethe constriction made by the wingset of articulatorsand therefore, at the gesturallevel, is solely an articulatory characterization.(This point will be elaborated in § 2.2.1). It curmodel, CD is a continuum divided into thefollowing discrete ranges: [closed], [cr:.,ical],[narrow], (mid], and [wide] (Ladefoged, 1988b,refers to such a partitioning ofa continuum as an'ordered set' of values). The two most closedcategories :orrespond approximately to acousticstops and fricatives; the names used for thesecategories indicate that these values arearticulatory rather than acoustic. Thus, thesecond degree of constriction, labelled [critical],indicates that critical degree of constriction for agesture at which some particular aerodynamicconsequences could obtain if there were

appropriate air flow and muscular tension. Thatis, the critical constriction value permits frication(turbulence) or voicing, depending on the set ofarticulators involved (oral or laryngeal). Similarly,[closed] refers to a tight articulatory llosure forthat particulae gesture; the overall state of thevocal tract might cause this closure to be,acoustically, either a stop or a sonorant. This, inturn, will be determined by the combined effects ofthe concurrently active gestures. § 3 will discussin greater detail how we account for naturalclasses that depend on the consequences ofcombining gestures.

The categorical distinctions among [closed],[critical], and the wider values (as a group) areclearly balled on quantal articulatory-acousticrelations (Stevens, 1972). The basis for thedistinction among wider values is not as easy tofind in articulatory- acoustic relations, although[narrow] might be identified with Catford's (1977)[ approximant] category. Catford is also careful todefine 'articulatory stricture' in solely articulatoryterms. He defines [approximant] as a constrictionjust wide enough to yield turbulent flow underhigh airflow conditions such as in Jpen glottis forvoicelessness, but laminar flow under low airflowconditions such as in voicing. The otherdescriptors are required to distinguish amongvowels. For example, contrasts among frontvowels differing in height are represented as[palatal] constriction locations (cf. Wood, 1982)wish [narrow], [mid]. or [wide] CDs, where thesecategories might be established on the basis ofsufficient perceptual and articulatory contrast inthe vowel system (Lindblom, 1986; Lindblom etal., 1983). If additional differentiation is required,values are combined to indicate intermediatevalues (e.g., 'narrow mid]). In addition, [wide] vs.[narrow] can be used to distinguish the size ofglottal aperture--the CD for GLOassociatedwith as irated and unaspirated stops,respectively.

2.1.3 Constrictiou location (CL)Unlike CD, which differs frcm its featurel

analog of manner by being articulatory ratherthan acoustic, both CL and its featurel analog ofplace are articulatory in definition. However, CLdiffers from place features in not beinghierarchically related to the articulator set. Thatis, the set of articulators moving to make aconstriction, and the location of that constriction,are two independent (albeit highly related)dimensions of a gesture. Thus, we use the label'Oral' rather than 'place' in the articulatoryhierarchy, not only to emphasize the anatomical

Page 95: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

86 Browman and Goldstein

nature of the hierarchy, but also to avoid theconflation of moving articulator and location alongthe tract that `place' conveys.

The constriction location refers to the locationon the upper or back wall of the vocal tract wherea gestural constriction occurs, and thus isseparate from, but constrained by, the articulatorset moving to make the constriction. Figu, 9expands the articulator sets, the CL descriptorvalues and the possible relations between them:the locations where a given set of articulators canform a constriction. Notice that each articulatorset maps onto a subset of the possible constrictionlocations (rather than all possible CLs), where thesubset is determined by anatomical possibility.Notice also that there is a non-unique mappingbetween articulator sets and CL values. Forexample, both TT and TB can form a constrictionat the hard palate; indeed, it may be possible forTB to form a constriction even further forward inthe mouth. Thus constriction location cannot besubsumed under the moving articulator hierarchy,contrary to the proposals by Ladefoged (1988a)and Sagey (1986), among others.

(Articulator Constriction Locations

Sitprotruded

lablel

LIPS

TT

dental

alveolar

TB

post-alveolar

to

velar

uvular

[..TR

LIONONamn.

pharyngeal

Figure 9. Possible mappin;P between articulator sets andconstriction locations.

This can be seen most clearly in the case of the[labial] and [dental] locations, where either theLIPS or TT articulator sets can form constrictions,as shown in (9) (the unusual linguo-labials are

discussed ir. Maddieson 1987). (A similar matrix ispresented it Ladefoged & Maddieson, 1986, butnot as part of a formal representation.) That is,the [labial] constriction location cannot beassociated exclusively with the LIPS articulatorset. Similarly, the [dental] constriction locationcannot be associated exclusively with the TTarticulator set. Thus, we view CL as anindependent cross-classifying descriptordimension whose values cannot be hierarchicallysubsumed uniquely under particular articulatorsets.

(9) Articulator set

labial bilabial linguo-labialCL

dental labio-dental dental

We use multivalued (rather than binary-valued)CL descriptors, following Ladefoged (1988b), sincethe CL descriptor values correspond to categoricalranges of the continuous dynamic parameter. Inthe front of the mouth, the basis for the discreteranges of CL presumably involve the actualdifferentiated anatomical landmarks, so thatparameter values are tuned with respect to them.There may, in addition, be relatively invariantauditory properties associated with the differertlocations (Stevens, 1989). For the categories thatare further back (palatal and beyond), Wood(1982) hypothesizes that the distinct CLs emergefrom the alignment of Stevens' quantalconsiderations with the positioning possibilitiesallowed by the tongue musculature. To the extentthat this set of descriptor values is too 'Lifted, itcan, again, be extended by combining descriptors,e.g., [palatal velar], or by using `retracted' and`advanced' (Prirciples of the IPA, 1:49).

2.1,4 Constriction shape (CS)In some cases, gestures involving the same

articulator sets and the same values of CL and CDmay diffe. in the shape of the constriction, aslooked at in the frontal, rather than scgittal,plane. For example, constrictions involving TTgestures may differ as to whether they are formedwith the actual tip or blade of the tongue, theshape of the constriction being 'wider' in the thirddimensi if produced with the blade. Theimportance of this difference has been built intorecent feature systems as apical vs. laminal(Ladefoged, 1988a), or [distributed] (Sagey, 1986).

4

Page 96: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Articulatory Gestures as Phonological Units 87

Some method for controlling such differencesneeds to be built into our computational model. Anadditional TT tract variable (TTR) that specifiesthe orientation (angle) of the tongue tip in thesagittal plane with respect to the CL and CD axesis currently being incorporated into the taskdynamic model. It is possible that differentsettings of this variable will be able to produce therequired apical/laminal differences, as well asallowing the sublaminal contact involved inextreme retroflex stops (Ladefoged & Maddieson;1986).

For TB gestures, an additional tract variable isalso required to control cross-sectional shape. Oneof the relevant shape differences involves theproduction of laterals, in which at least one of thesides of the tongue does not make firm contactwith the molars. Ladefoged (1980) suggests thatthe articulatory maneuver involved is a narrowingof the tongue volume, so that it is pulled awayfrom the sides of the mouth. Such narrowing is anattractive option for an additional TB shapingtract variable. In this proposal, an alveolar lateralwould essentially involve two gestures: a TTclosure, and a TB gesture with a [narrowed] valuefor CS and perhaps &faults for CL and CD. (Somefurther specification of TBCD would clearly b.:required for lateral fricatives, as opposed tolateral approximants). In this sense, lateralswould be complex constellations of gestures, assuggested by Ladefoged sad Maddieson (1986),and similar to the proposal of Keating (1988) fortreating palatals as complex segments. Anotherrole for a TB shaping tract variable might involvebunching for rhotics (Ladefoged, 1980). Finally, itis unclear whether an additional shape parameteris required for LIPS gestures. It might be requiredto describe differences between the two kinds ofrounding observed in Swedish (e.g., Lindau, 1978;Linker, 1982). On the other hand, given properconstraints on lip shape (Abry & Boe, 1986), itmay be possible to produce all the required lipshapes with just LA and LP.

2.1.5 Dynamic descriptorsThe stiffness (k) of a gesture is a dynamical

parameter, inferred from articulatory motions,that has been shown to vary as a function ofgestural CD, stress and speaking rate (Browman& Goldstein, 1985; Kelso et al., 1985; Ostry &Munhall, 1985). In addition, however, wehypothesize that stiffness may be tunedindependently, so that it can serve as the primarydistinction between two gestures. 5/ and /w/ aretwo cases in which gestural stiffness as an

additional independent parameter may bespecified. iii is a TB [narrow palatal] gesture, and/w/ is a complex formed by a TB [narrow velar]gesture and a LIPS [narrow protruded] gesture.Our current hypothesis is that these gestureshave the same CD and CL as for thecorresponding vowels (IV and /u/), but that theydiffer in having an [increased] value of stiffness.This is similar to the articulatory description ofglides in Catford (1977). Finally, it is possible thatthe stiffness value that governs the rate ofmovement into a constriction is related to theactual biomechanical stiffness of the tissuesinvolved. If so, it would be relevant to thosegestures that, in fact, require a specific muscularstiffness: trills and taus likely involvecharacteristic values of oral gesture stiffness, andpitch control and certain phonation types requirespecified vocal fold stiffnesses (Halle & Stevens,1971; Ladefoged, 1988a).

2.2 The representation of phonologicalunits

In § 2.1, we laid out the detai's of gesturaldescriptors and how they are organized by anarticulatory geometry. Setting aside for themoment the critical aspects of gestures discussedin § 1.3 (internal duration and overlap), thedifferences between feature geometry and theorganization of gestural descriptors have so farbeen comparatively minorprimarily theproposed Tongue node and the non-hierarchicalrelation between constriction location and themoving articulators. In this section, we considertwo ways in which a gestural analysis suggests adifferent organization of phonological structurefrom that proposed by the feature geometriesreferred to in § 2.1.1. First, we present evidencethat gestures function as cohesive units ofphonological structure (§ 2.2.1). Second, wediscuss the advantages of the gestural score as aphonological notation (§ 2.2.2).

2.2.1 The gesture as a unit in phonologicalpatterns

In Figure 8, gestures are in effect the terminalnodes of a feature tree. Notice that theconstriction degree, as well as constrictionlocation, constriction shape and stiffness, is part ofthe descriptor bundle. That is, constriction degreeis specified directly at the articulator nodeit isone dimension of gestural movement. Looked at interms of the gestural score (e.g., Figure 3),successive units on a given oral tier containspecifications for both constriction location and

I ;Ei 5

Page 97: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

88 Browman and Goldstein

degree of that articulator set. This positioning ofCD differs from that proposed in current featuregeometries. While CL and CS (or their analogs)are typically considered to be dependents of thearticulator nodes, CD (or its analogs) is not.Rather, some of the closest analogs to CD[stricture], [continuant], and [sonorant] areusually associated with higher levels, either theSupra laryngeal node (Clements, 1985) or the Rootnode (Ladefoged, 1988a,b; Ladefoged & Halle,1988; McCarthy, 1988; Sagey, 1986). Is there anyevidence, then, that the unit of the gesture plays arole in phonological organization? Within theapproach of feature geometry, such evidencewould consist, for example, of rules in which anarticulator set and the degree and location of theconstriction it forms either spread or deletetogether, as a unitary whole.

It is in fact generally assumed, implicitlyalthough not explicitly, that velic and glottalfeatures have this type of unitary gesturalorganization. That is, features such as [+nasal]and [-voice] combine the constriction degree (wide)along with the articulator set (velum or glottis).Assuming a default specification for these features([-nasal] and [ +voice]), then denasalizationconsists of the deletion cf a nasal gesture, andintervocalic voicing of the deletion of a laryngealgesture. Additional implicit use of a gestural unitcan be found in Sagey's (1986) proposal that[round] be subordinate to the Labial node. This isexactly the organization that a gestural analysissuggests, since [round] is effectively a specificationof the degree and nature of .ne constriction of thelips.

It is in the case of primary oral gestures thatproposals positioning CD at the gestural level andthose positioning it at higher levels contrast mostsharply. The inherent connection among all thecomponent aspects of making a constriction, or inother words, the unitary nature of the gesture, canbe seen mostly clearly when oral gestures aredeleted, a phenomenon sometimes described asdelinking of the Place (or Supralaryngeal) nodedebuccalization. In a gestural analysis, when themovement of an articulator to make a constrictionis deleted, everything about that constriction isdeleted, including the constriction degree.

For example, Thrdinsson (1978, cited inClements & Keyser, 1983) demonstrates that inIcelandic, the productive phenomenon wherebythe first of a sequence of two identical voicelessaspirated stops is replaced by rh], consists ofdeleting the first set of suprala (geal features.

This set of supralaryngeal features corresponds tothe unit of an oral gesture; it includes theconstriction degree, whether described as [do],[stop] or [-continuant]. Thus, in this example, theentire oral gesture is deleted.

Another example is cited in McCarthy (in press),using data from Straight (1976) and Lombardi(1987). In homorganic stop clusters and i Fricates(with an intervening word boundary) in YucatecMaya, the oral portion of the initial stop isdeletedfor example, /k #k/ --, /h#k/, and(affricate) /ts#t/ --, /s#t/. Again in this case theconstriction degree is deleted along with the place,supporting a gestural analysis: the entire oralclosure gesture is deleted. Note that McCarthyeffectively supports this analysis, in spite of hispositing [continuant] as dependent on the Rootnode, when he says 'a stop becomes a segmentwith no value for [continuant], which isincompatible with supraglottal articulation.'

Thus, there is some clear evidence inphonological pattern ng for the association of CDwith the articulator node. This aspect of gesturalorganization is totally compatible with the basicapproach of feature geometry, requiring only thatCD be linked to the articulator node rather thanto a higher level node. In $ 3, we will show howthis gestural affiliation of CD is also compatiblewith phonological examples in which CD isseemingly separable from the gesturewhere it is'left behind' when a particular articulator-CI.combination is deleted, or where it appears not tospread along with the articulatory set and CL. Fornow, we turn to a second aspect of the gesturalapproach that could usefully be incorporated intofeature geometry, this one involving the use of thegestural score as a two-dimensional projection ofthe inherently three-dimensional phone logicalrepresentation.

2.2.2 Gestural scores, articulatorygeometry, and phonological units

The gestural score, in particular the 'point' formin Figure 4, is topologically similar to a nonlinearphonological representation. When combined withthe hierarchical articulatory geometry, thegestural score captures several releventdimensions of a nonlinear representationsimultaneously, in a clear and revealing fashion.Figure 10 shows a gestural score for palm, usingpoint notation and association lines, combinedwith the articulatory geometry on the left. Notethat the geometry tree is represented on its side,rather than the more usual up-down orientation.

06

Page 98: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Articulato Gestures as Phonoto Units

VEL

TB

LIPS

GLO

'palm'[pam]

Figure10. Gestural score for palm [pant) in point notation, with articulatory geometry.

This seemingly trivial point in fact is a very usefulconsequence of using gestural scores as phono-logical notation, as we shall see.It permits spatialorganization such as the articulatory geometry,which is represented on the vertical axis, to beseparated from temporal information includingsequencing of phonological units, which isrepresented on the horizontal axis. Such aseparation is particularly useful in those instancesin which the gestures in a phonological unit arenot simultaneous.

For example, prenasalized stops are singlephonological units consisting of a sequence ofnasal specifications. Figure lla depicts r. gesturalscore for prenasalized [Alb], which contains aclosure gesture on the LIPS tier associated with asequence of two gestures on the VEL tier, one withCD = [wide] followed by one with CD = [do].Double articulations also constitute a singlephonological unit. Figure 1 lb dish.lays a gesturalscore for [g], which consists of a [do labial]gesture on the LIPS tier associated with a [dovelar) gesture on the TB tier. Sagey (1986) termsthe first type of unit a contour segment, and thesecond type a complex segment, a terminologythat is an apt description of the two figures.

Compare the representations in Figures 11a andllb to those in Figures 11c and 11d, which areSagey's (1986) feature geometry specifications ofthe same two phonological units. In the featuregeometry representation, the two types ofsegme^ ts appear to be equally sequential, orequally non-sequential. This is a consequencean

unfortunate consequenceof conflating two usesof branching notation, that of indicating (order-free) hierachical information and that ofindicating sequencing information. Thus, inFigure 11d (on the right), the branching linesrepresent order-free branching in a hierarchicaltree. In Figure 11c, however, the branching linesrepresent the associations between two orderedelements and a single node on another tier. Thisconflation results from the particular choice ofhow to project an inherently three-dimensionalphonological structure (feature hierarchy xphonological unit constituency x time) onto twodimensions.

In the gestural score, the two-dimensionalprojection avoids this conflation of the two uses ofbranching notation. Here nodes always representgestures and lines are always association 'orphasing) lines. Thus any branching, as in Figure11a, always indicates temporal information aboutsequencing. The hierarchical information aboutthe articulatory geometry is present in theorganization of the articulatory tier, and is thusrepresented (once) by a sideway: tree all the wayat the left of the gestural score (as in Figure 10).In this way, the gestural score retains the virtuesof earlier forms of phonological notation in whichsequencing information was clearlydistinguishable, while also providing the benefitsof the spatial geometry that organizes the tiershierarchically. It would, of course, be possible toadopt this kind of representation within featuregeome+y.

97

Page 99: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Brounnan and Goldstein

Finally, the reader may have observed that, indiscussions to now, constituency in pilonologicalunits has been indicated solely by usingassociation lines between gestures. In particular,there has been no separate representation ofprosodic phonological structure. This is not aninherent aspect of r phonology of articulatorygestures; rather, it is the result of our currentresearch strategy, which is to see how m 111structure inheres directly in the relations among

gestures, without recourse to higher level nodes.However, once again, it is possible to integrate thegestural score with other types of phonologicalrepresentation. To exemplify how the gesturalscore can be integrated with explicit phonologicalstructure, Figure 12 displays a mapping betweenthe gestural score al. palm and a simplifiedversion of the prosodic structure of Selkirk (1988),with the articulatory geometry indicated on theleft.

Figure 11. Comparison of gestural score and feature geometry representations for contour and complex segments. (a)Gestural score for contour segment [nab (b) gestural score for complex segment lab (c) feature geometry for contoursegment Cal; (d) feature geometry for complex segment rm.

liar

Mors Ow

Root Tier

54:47-,

Imird (15410

--#Figure 12. Gestural score for palm [pam] in point r.L'Ation, mapped on to prosodic structure, with articulatory hierarchyon left.

08

Page 100: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Articulatory Gestures as Phonological Units 91

In short, although gestures and gestural scoresoriginated in a description of articulatormovement patterns, they nevertheless provideconstructs that are useful for phonologicalrepresentation. Gestures constitute anorganization (combining a moving articulator setwith the degree and location of the constrictionformed) that can act as a phonological unit.Gestural scores provide a useful notation fornonlinear phonological representations, one thatpermits phonological constituency to be expressedin the same representation with 'phonetic' orderinformation, which is indicated by the relativepositions of the gestures in the temporal matrix.The gestures can be grouped into higher-levelunits, using either association lines or a mappingonto prosodic structure, regardless of thesimultaneity, serpent' ality, or partial overlap ofthe gestures. In addition, the articulatorygeometry organizing the tiers can be expressed inthe same representation.

3 TUBE GEOMETRY ANDCONSTRICTION DEGREE HIERARCHY

So far, we have been treating individualgestures as the terminal nodes of an anatomicalhierarchy. From this perspective, articulatorygeometry serves as a way of creating naturalclasses of gestures on anatomical grounds. In thissection, we turn to the vocal tract as a whole, andconsider how additional natural classes emergefrom the combined effect (articulatory,aerodynamic and amid; consequences) of a set ofconcurrent gestures. We argue that these naturalclasses can best be characterized using ahierarchy for constriction degree within the vocaltr.& based on tube geometry.

Rather than viewing the vocal tract as a set ofarticulators, organized by the anatomy, tubegeometry views it as a set of tubes, connectedeither in series or in parallel. The sets ofarticulators move within these tubes, creatingconstrictions. In other words, articulatorygestures occur within the individual tubes. Morethan one gesture may be simultaneously active;together, these gestures interact to determine theoverall aerodynamic and acoustic output of theentire linked set of tubes. Similarities in the tubeconsequences of a set of different gestures maylead to similar phonological behavior: this is thesource of acoustic features (Ladefoged, 1988a)such as [grave] and [flat] (Jakobson, Fent, &Halle, 1969) that organize different gestures (inour terms) according to standing wave nodes in

the oral tube (Ohala, 1985; Ohala & Lorentz,1977).

Within this tube perspective, there is a vocaltract hierarchy that characterizes aninstantaneous time slice of the output of the vocaltract. Such a hierarchy may appear identical tothe feature geometry of Sagey (1986). There are,however, two crucial differences. Unlike Sagey'sroot node, which 'corresponds neither to anatomyof the vocal tract nor to acoustic properties' (1986,p. 16), the highest node in the vocal tracthierarchy characterizes the physical state of thevocal tract at a single instant in time. In addition,tube geometry characterizes mannerCDatmore than just the highest level node. CD ischaracterized at each level of the hierarchy. Thus,each of the tubes and sets of compound tubes inthe hierarchy will have its own effectiveconstriction degree that is completely predictablefrom the CD of its constituents, and henceultimately from the CD of the currently activegestures. At the vocal tract level, the effective CDof the supralaryngeal tract, taken together withthe initiator power provided by the lungs,determines the nature of the actual airflowthrough the vocal tract: none (complete occlusion),_turbulent flow, or laminar flow. This vocal tracthierarchy thus serves as the basis for a hierarchyfor CD.

The important roint here for phonology is that,in the output system, CD exists simultaneously atall the nodes in the vocal tract hierarchyit is notisolable to any single node, but rather forms itsown CD hierarchy. As we argue in { 3.2, all levelsof the hierarchy are potentially important inaccounting for phonological regularities, and mayform the basis of a natural class. First, however,in 3.1, we expand on the nature of tubegeometry.

3.1 How tube geometry worksFigure 13 provides a pictorial representation of

the tubes that constitute the vocal tract,embedded in the space defined by the anatomicalhierarchy of articulatory geometry, in the verticaldimension, and by the constriction degreehierarchy of tube geometry, in the horizontaldimension. The constriction action of individualgestures is schematically represented by the smallgrey disks. TI,us, the two dimensions in the figureserve to organize the gestures occurring in thevocal tract tubes, vertically in terms ofarticulatory geometry, and horizontally in termsof tube geometry. Looking at the tube structure inthe center, we can see that the vocal tract

S9

Page 101: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

92 Brownian and Goldstein

branches into three distinct tubes or airflowchannels: a Nasal tube, a Central tongue channeland a L. al tongue channel. The complex isterminateu by GLO gestures at one end. At theother end, the Central and Lateral channels aretogether terminated by LIPS gestures.

The tube geometry tree, shown at the top of thefigure, reflects the combinations of the tubes andtheir terminators. The tube level of the tree hasfive nodes, one for each of the three basic tubesand the two terminators. Within each of thesebasic tubes, the CD will be determined by the CDof the gestures acting within that tube, which areshown as subordinates to the tube nodes. Forexample, the CD of TT gestures will contribute to

Oral---........

Vocal/ Tongue

Tract

LIPS

TT

TB

VEL

GLO- -

determining the CD of the Central tube. TBgestures will contribute to the Central and/orLateral tube, depending on the constriction shapeof the TB gesture. Each of the superordinatenodes corresponds to a tube junction, forming acompound tube from simpler tubes and/or thetermination of a tube. Thus, the Central andLateral tubes together form a compound tubelabelled the Tongue tube. The compound Tonguetube is terminated by the LIPS configuration,which combines with the Tongue tube to form anOral tube. The Oral and Nasal tubes form anothercompound tube, the Supralaryngeal, which isterminated by GLO gestures to form the overfillVocal Tract compound tube.

I Tract

Supralaryligeal

GLO

.

Tong/Naisal Lateral ,4ralVEL TB TB TT

. .

LIES -01111 Tube

.

Figure 13. Vocal tract hierarchy: Articulatory and tube geometry.

loo

Page 102: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Articulatory Gesturer as Phonological Units 93

. .

The effective CD at each superorlinate node canbe predicted from the CD of the tubes being joinedand the way they are joined. When tubes arejoined in parallel, the effect' e CD of thecompound tube has the CD of the widestcomponent tube, that is the maximum CD.Whenthey are joined in series, the compound tube hasthe CD of the narrowest component tube, that is,the minimum CD. Terminations and multipleconstrictions within the same tube work like tubesconnected in series. Using these principles, it ispossible to 'percolate' CD values from the valuesfor individual gestures up to the various nodes inthe hierarchy. Table 1 shows the possible valuesfor CD of individual gestures, and Table 2 showshow to determine the CD values at eachsuccessive node up through the Supralaryngealnode, referred to hereafter as the Supra node (theVocal Tract node ;ill be discussed below).

Table 1. Possible constriction degree (CD) values atgestural tract variable level : open = = (narrow OR midOR wide).

a. LIPSb. TTc. TBd. TBe. VELf....,L0

[CD][CD][CD, CS = normal][CD, CS = narrowed][CD][CD]

= do crit open= do crit open= do crit open= do crit open= clo open= do ctit open

GLO [crit] is value appropriate for Voicing

Table 2. Percolation of CD up through Supralaryngealnode.

a. NasalLateralCentral

b. Tonguec. Orald. Supra

[CD] = VEL [CD]

[CJ] = TB [CD, CS = narrowed][CD] = MIN (7T [CD], TB [CD, CS = normal])[CD] = MAX (Cenral [CD], Lateral [CD])[CD] = MIN (Tongue [CD], LIPS [CD])

[CD] = MAX (Oral [CD], Nasal [CD])

The percolation principles follow fromaerodynamic considerations. Basically, airflowthrough a tube system will follow the path of leastresistance, as we can illustrate with examples ofnasals and laterals. The Oral and Nasal tubes areconnected in parallel, forming a compoundSupralaryngeal tube. If the Oral tube is [closed]

but the Nasal tube is (open], Table 2d indicatesthat the CD of the combined Supra tube will bethe wider of the two openings, i.e., [open], as it isin a nasal stop. To take a less obvious example,consider the case where the Nasal tube is [open],but the Oral tube is [crit], i.e., appropriate forturbulence generation under the appropriateairflow conditions. Table 2d predicts that in thiscase as well the Supra CD is [open], implying thatthere will be no turbulence generated. This inturn predicts that nasalized fricatives, whichconsist of exactly the VEL [open] and Oral [crit]gestures, should not exist. That is, given that, atnormal airflow rates, the air in this configurationwill tend to follow the path of least resistancethrough the [open] nasal passage, nasalizedfricatives would require abnormal degrees of totalairflow in order to generate airflow through the[crit] constriction that is sufficient to produceturbulence.

Ohala (1975) has *posed that such nasalizedfricative' are, indeed, rare, precisely because theyare hard to produce. As Ladefoged and Maddieson(1986) argue, this could account for alternations inwhich the nasalized counterparts of voicedfricatives are voiced approximants, such as inGuarani (Gregores & Stuirez, 1967, in Ladefoged& Maddieson, 1986). However, they also presentevidence from Schadeberg (1982) for a nasalizedlabio-dental fricative in Umbundu. This suggeststhat the percolation principles may be overriddenin certain special cases, perhaps by increasedairflow settings.

At the highest level, that of the Vocal Tract, theglottal (GLO) CD and stiffness and the Supra CDcombine with initiator (pulmonic) action todetermine the actual aerodynamic and acousticcharacteristics of airflow through the vocal tract.At this point, then, it becomes more appropriate tolabel the states of the Vocal Tract in terms of thecharacteristics of this 'output' airflow, rather thanin terms of the CD, which is one of its determiningparameters. A gross, but linguistically relevant,characterization of the output distinguishes thethree states defined in Table 3a: occlusion, noiseand resonance. Assuming some 'average' value forinitiator power, Table 3b shows how GLO andSupra CD jointly determine these properties. Acomplete closure of either system results inocclusion. If GLO is [crit] (appropriate position forvoicing, assuming also appropriate stiffness), thenit combines with an open Supralaryngeal tract toproduce resonance. Any other condition produces

101

Page 103: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

94 Browman and Goldstein

noise. In some cases, e.g., GLO [open] and Supra[open], this is weak noise generated at the glottistaspiration). In other cases, e.g., Supra [crit], thenoise is generated in the Oral tube (frication).Further details of the output, such as where theturbulence is generated, and whether voicingaccompanies occlusion or noise, are beyond ourscope here. Ohala (1983) gives many examples ofhow the principles involved are relevant to aspectsof phonological patterning.

Table 3. Acoustic consequences at Vocal Tract level.

a. VT outputsocclusion:

noise:resonance:

b. VT [CD]

no airflow throui,n VT; silence or low - amplitudevoicingturbulent airflowlaming airflow with voicing; formant structure

35 occlusion / Supra [clo] OR CLO [clo]35 resonance / Supra [open] AND CLO [crit]= noise / otherwise

Finally, note that the percolation of CD throughlevels of the tube geometry can be defined at anyinstant in time, and depends on the actual size ofthe constriction within each tube at that point intime, regardless of whether some gesture isactively producing the constriction. Thus, theinstantaneous CD is the output consequence of thedefault values for articulators not under activecontrol, the history of the articulator movements,and the constellation of gestures currently active.Table 4 shows the default value we are assumingfor each basic tube and terminator. (The defaultfor Lateral is seriously oversimplified, and doesnot givo the right Lateral CD in the case ofCentral fricatives, although percolation to theOral level will work correctly).

Table 4. Default CD values for basic tubes andterminators.

NasalLateralCentralLIPSOW

[CD] = do[CD] = Central [CD][CD] = open[CD] = open[CD] = crit

Default values are determined by the effect of the modelarticulators' neutral configuration within a tube

IMO

3.2 The constriction degree hierarchyThe importance of tube geometry for phonology

resides in the fact that constriction degree is notisolable to any single level of the vocal tract.Rather, constriction degree exists at a number oflevels simultaneously, with the CD at each level inthis hierarchy defining a potential natural class.In this section, we present some examples inwhich CD from different levels is phonologicallyrelevant, and argue that the CD hierarchy isimportant and clarifying for any featural system,as well as for gestural phonology.

We are proposing (1) that there is a universalgeometry for CD in which all nodes in the CDhierarchy are simultaneously present, and (2) thatdifferent CD nodes are used to represent differentnatural classes. This approach differs from thatadopted in current feature geometries, in whichmanner features such as [nasal], [sonorant],[continuant] or [consonantal] are usuallyrepresented at a single level in the featurehierarchy. Different geometries, however, choosedifferent levels. For example, [nasal] has beenvariously considered to be a feature dependent onthe highest (Root) node (McCarthy, 1988), on tLeSupralaryngeal node (via a manner node)(Clements, 1983), or on a Soft Palate node(Ladefoged, 1988a,b; Ladefoged et Halle, 1988;Sagey, 1986). It is possible that this variability inthe treatment of [nasal], and cam features,reflects the natural variation in the CD hierarchy,such that different phonological phenomena arecaptured using CD at different levels in thehierarchy.

To see how this might work, consider the fourhierarchies in Figure 14, which use the tubegeometry at the top of Figure 13 to characterizethe linked CD structures for a vowel, a lateral, anisei, and an oral stop. Since tube geometry playsthe same role in the representation as thearticulatory geometry in Figure 10, the tubehierarchies are rotated 90 degrees just as theanatomical hi Jrarchy was. The circles are agraphic representation of the CD for each node,with the filled circles indicating CD = [do], thewavy lines indicating CD = [crit], and the opencircles indicating CD = [open] (the symbolsindicate occlusion, noise, and resonance,respectively, at the Vocal Tract level). These CDhierarchies express the various classificatory(featural) similarities and dissimilarities amongthe four segments, but they do so at differentlevels.

.102

Page 104: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Articulatory Gestures as Phonological Units 95

-IC3_

ocoNLI

Vowel /a/

0-0rJ

Nasal /n/

Figure 14. CD hierarchies for vowel, lateral, nasal and oral stop.

For example, the nasal has the samecharacterization as the vowel and lateral at theSupra laryngeal and Vocal Tract le /els, butdiverges at lower levels. This aspect of the CDhierarchy thus defines a phonological naturalc'ass consisting of nasals, laterals, and vowels,where the class is characterized by Supra [open]and VT ['open' = resonance]. This natural class istypically represented by the feature [sonorant],althJugh different feature systems select one orthe other of these levels in their definition of

2m

1.-

Lateral /I/

<:.<6

0

Oral Stop /d/

sonority. For example, Ladefoged's (1988a,b)acoustic feature distinction of [sonorant] utilizesthe identity of the value in the CD hierarchy atthe VT level. That is, for Ladefoged (+sonorant]differs from [-sonorant] in terms of output at theVT level. However, for Chomsky and Halle (1968)and Stevens (1972), it is the identity at theSupralaryngeal level that characterizes sonority,since they consider /h/ and /?/ to be sonorants eventhough they differ from the nasal, vowel andlateral at the VT level.

103

Page 105: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

96 9rowmar and Goldstein

At the Oral level (and below), the nasal and stopdisplay the same linked CD structure, whichdiffers from that of the lateral and that of thevowel. That is, nasals and stops form a naturalclass in having Oral [do], whereas laterals andvowels form a class in having Cal [open]. Thisdifference in CD resides at a lower level of thehierarchy than the Vocal Tract or Supralaryngeal levels. And at the lowest levels, the Tubeand Gestural, the nasal differs from the stop,lateral, and vowel in having both a Nasal r,CD]and VET, (CD] that are (open]. Thus, constrictiondegree at various levels can serve to categorizeand distinguish phonological unite in differentways.

Within traditional feature systems, each of thenatural classes described above receives aseparate name, which obscures the systematicrelation among the various classes. Moreover,even in feature geometry, when manner featuresare restricted to a single level there is nuprincipled representation of the hierarchicalrelation among the natural classes. However,feature geometries sometimes incorporate piecesof the CD hierarchy, for example, in theassignment of [nasal] and (continuant] asdependents of two different nodes in the hierarchy(Soft Palate and Root: Sagey, 1986), or theassignment of (sonorant] as part of the Root nodeitself, but (continuant] as a dependent on the Rootnode (McCarthy, 1988). Note, however, thathierarchical relations among manner classes haveto be stipulated in feature geometries, whereassuch relations are inherent in the CD hierarchy.In addition, the percolation principles of tubegeometry provide a mechanism for relating valuesof the different levels to each other, againsomething that would need to be stipulated in ahierarchy not based on tube geometry.

All the levels of the CD hierarchy appear to beuseful for establishing natural classes, and forrelating the CD natural classes to one another. Inaddition, various kinds of phonological patterns,such as phonological alternations, can beexamined in light of this hierarchy. In particular,we can investigate whether regularities are bestexnressed as processes that treat CD as tied to aparticular gesture (as an 'input' parameter) or as aconsequence of gestural combination at thevarious 'output' levels of the CD hierarchy. Forexample, the cases discussed in 2.2.1 (e.g., /k#k/

/h#k/) can be hest described as processes thatdelete entire gestures. CD is tied to the gestureand is deleted along with the articulator set andCL. This deletion automatically accounts (by

percolation_) "or the CD changes at other levels ofthe hierarchy.

In other cases, however, there is muchoverdeterminationthe phonological behavior canbe described equally well from more than oneperspective. For example, McCarthy (1988)discusses the common instances in which Is/ ) /h/,and glottalized consonants /p' t' k'/ N. Both ofthese examples can be straightforwardly analyzedas deletion of the oral gesture, as in the cases in

2.2.1. But it is also true that, in both cases, theVocal Tract output CD is unchanged by thegestural deletion. In the first case, it remainsnoisy, and in the second it remains an occlusion.Thus, the description of the phonological behaviorcould also focus on the (apparent) relativeindependence of the VT (CD] and the articulatorset involvedthe articulator set(s) changes, butCD does not. The equivalence of the twoperspectives results from that fact that thepercolated values of VT [CD] are the same for thetwo gestures alone and for their combination.Thus, deletion of one or the other will not changethe VT [CD]. Examples of this kind, of which thereare likely to be many, can be viewed from a dualperspective.

One example of process for which it seems (atfirst glance) that a dual perspective cannot bemaintained involves assimilation. Steriade (inpress) has argued that assimilation is problematicfor the gestural approach, since sometimes onlythe place, and not the manner, features areassimilated. For example, in Kpelle nasalsassimilate in place but not manner to a followingstop or fricative, so that IN-0 becomes a sequence(broadly) transcribed as [mv] (Sagey, 1986). AsSteriade points out, this separation of the placeand manner features appears to argue against agestural analysis, in which the assimilationresults from increased overlap between the oralgesture for the fricative (LIPS [crit den]) and thevelum lowering gesture (VEL (widel). Since thenasal does not become a nasal fricative, it wouldappear either that there is no increased overlap,or that the LIPS [crit] gesture changes its CDwhen it overlaps the velum lowering gesture.From the perspective of the CD hierarchy, theSupra CD of the nasal is the same before and afterthe asamilation ((open]), and therefore the Supralevel (or VT level) is the significant one for CD,rather than the Gestural level. However, thepercolation principles from tube geometry predictthat the Supra CD will be [open] even if the nasalis overlapped by a fricative gesture, as derived in(10).

104

Page 106: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

(10)

Supra (open)

Articulatory Gestures as Phonological Units 97

Oral [crit]--- Tongue (open]

Central (open)

There could also be a Ti' gesture in the derivationthat is either hidden or deletedthe Supra CDwould still be [open]. Thus, an increase in gesturalov.rlap could produce both the place assimilationas well as the correct CD at the Supra level,without any change of the Gestural CD. (Anarticulatory study is needed to determine whetherthe Gestural CD does indeed change).

The examples presented so far are processesthat tire either best described with CD attachedfirmly to a given gesture as it undergoes change(e.g., deletion or sliding), or are at least equallywell described that way. However, there arephenomena whose description requii es a'loosening' of the relation between CD and thegesture. One such situation, in which the GesturalCD is effectively separated from its articulator set,occurs in historical change (Brownian &Goldstein, in press). in particular in the types ofhistorical change that Ohala (1981) terms'listener-based' sound anges. Such cases ariseparticularly when gestural overlap leads toambiguity in the acoustic signal that the listenercan parse into one of two gestural patterns. Insuch cases, Browman and Goldstein (in press)argue that there is a (historical) 'reassignment' ofthe constriction degree parameter betweenoverlapping gestures. This analysis was used toaccount for the /x/ /f/ changes in English wordslike cough, based on an argument originally putforth by Pagliuca (1982). Briefly, the analysisproposed that the fcrit] descriptor for the TBgesture for /x/ was re-assigned to an overlappingLIPS gesture.

A related synchronic situation, involving twooverlapping gestures in a complex segment, isdiscussed in Sagey (1986). Her analysis suggeststhat manner features must be represented onmore than a single node in a feature hierarchy.

LIPS (critl

VEL [wide)

Specifically, Sagey demonstrates that multipleoral gestures cohering in a complex segment maybe restricted to a single distinctive constrictiondegree. She represents this single distinctiveconstriction degree at the highest level in thehierarchy, but still must represent whichparticular articulation bears the contrastive CD.Thus, CD is specified at two different levels. Sageydiagrams the relation between the two levels by alooping arrow drawn between the distinctive leveland the 'major' articulator making the distinctiveconstriction. In the CD hierarchy, the Oral CDwould be the lowest possible distinctive level forthese double articulations. The gesture thatcarries the distinctive CD could be marked as[head], and would automatically agree in CD withthe mother node, as in (11).

Oral EaCD1

Gesture [aCD][head]

"Gesture

Moreover, in examples such as this, thepercolation principles do not contribute todetermining the CD of the distinctive node, exceptin a negative way to be discussed shortly. Sageyargues that the distinctive CD cannot be predictedfrom physical principles, since in Margi labio-coronals, the corona! articulation is major, andhence in the less radical constriction is thedistinctive constriction, rather than the moreradical one. Thus, the relation between the Oraland Gestural levels in (11) mint be a statementabout a functional phonological unit, a gesturalconstellation, rather than a statement

105

Page 107: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

98 Brotoman and Goldstein

characterizing an instantaneous time slice interms of tube geometry. That is, only one of thegestures in the constellation may bear adistinctive CD. Nevertheless, the importance ofphysical overlap relations can also be seen in thisexample. Maddieson and Ladefoged (1989) haveshown that complex segments such as ra] are notcompletely synchronous, apparently lendingsupport to the distinction between phoneticordering, on the one hand, and phonologicalunordering, on the other hand. However, we arguethat the ordering of gestures within a complexsegment is phonologically important in exactly thecase of a phonological unit like Margi ffig, wherethe distinctive value of CD at higher levels is notpredicted from the lower level CDs by thepercolation principles. If two gestures overlapcompletely (i.e., are precisely coextensive), andthere are no additional phonetic cues of the sortdiscussed in Maddieson & Ladefoged, then thepercolation principles will determine the higherlevel constriction degree throughout the entiretime-course of the phonological unit, and thedistinctive CD will fail to be conveyed. Forexample, in the case of 43/, if the two gestureswere precisely aligned, the (distinctive) fricationwould never appear at the VT level. Only if thegestures are slightly offset can the distinctive CDbe communiJated.

In general, the CD hierarchy affords a structurewithin which a typology of phonological processescan be developed, based on the CD level thatseems most relevant It is an interesting researchchallenge to develop this typology, and to ask howit is related to other ways of categorizing theprocesses. For example, are there systematicdifferences in relevant CD level that correlatewith whether the process involves spreading(sliding) rules or ,ieletion rules, or with whetherthe process is prelexical or postlexical?

4 SUMMARYWe have argued that dynamically-defined

articulatory gestures are the appropriate units toserve as the atoms of phonological representation.Gestures are a natural unit, not only because theyinvolve task-oriented movements of thearticulators, but because they arguably emerge asprelinguistic discrete units of action in infants.The use of gestures, rather than constellations ofgestures as in Root nodes, as basic units ofdescription makes it possible to characterize avariety of language patterns in which gesturalorganization varies. Such patterns range from themisorderings of disordered speech through

phonological rules involving gestural overlap anddeletion to historical changes in which the overlapof gestures provides a crucial explanatoryelement.

Gestures can participate in language patternsinvolving overlap because they are spatiotemporalin nature and therefore have internal duration. Inaddition, gestures differ from current theories offeature geometry by including the constrictiondegree as an inherent part of the gesture. Sincethe gestural constrictions occur in the vocal tract,which can be characterized in terms of tubegeometry, all the levels of the vocal tract will beconstricted, leading to a constriction degreehierarchy. The values of the constriction degree ateach higher level node in the hierarchy can bepredicted on the basis of the percolation principlesand tube geometry. In this way, the use ofgestures as atoms can be reconciled with the useof constriction degree at various levels in the vocaltract (or feature geometry) hierarchy.

The phonological notation developed for thegestural approach might usefully be incorporated,in whole or in part, into other phonologies. Fivecomponents of the notation were discussed, allderived from the basic premise that gestures arethe primitive phonological unit, organized intogestural scores. These components include (1)constriction degree as a subordinate of thearticulator node and (2) stiffness (duration) as asubordinate of the articulator node. That is, bothCD and duration are inherent to the gesture. Thegestures are arranged in gestural scores using (3)articulatory tiers, with (4) the relevant geometry(articulatory, tube or feature) inc'tcated to the leftof the score and (5) structural information abovethe score, if desired. Association lines can also beused to indicate how the gestures are combinedinto phonological units. Thus, gestures can serveboth as characterizations of articulatorymovement data and as the atoms of phonologicalrepresentation.

REFERENCESAbercrombie, D. (1967). Elements of general phonetics. Edinburgh:

Edinburgh University Press.Abry, C., & B0e, L.J. (1986). 'Laws' for Ups. Speech Comr.unkation,

5, 97-104.Anderson, J., & Ewen, C. J. (1987). Principles of dependency

phonology. Cambridge: Cambridge University Press.Anderson, S. R. (1974). The organisation ol phonology. New York:

Academic Press.Barry, M. C. (1985). A palatographic study of connected speech

processes. Cambridge Papers in Phone( ':..s and ExperimentalLinguistics, 4, 1-16.

Bell -Bert, F., & Harris, K. S. (1981). A temporal model of speechproduction. Phonetics, 38, 9-20.

106

Page 108: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Articulatory Gestures as Phonological Units 99

Best, C. T., & D. Wi &enfold (1988). Phonologically in,-.,tivatedsubstitutions in a 20.22 month old's imitations of intervocaLicalveolar stops. Journal of the Acoustical Society of America, 84,(51), $114. (Abstract)

Boucher, V. (1988). For a phonology without 'rules': Peripheraltiming-structures and allophonic patterns in French andEnglish. In S. Embleton (Ed.), The Fourteenth LACUS Forum 1987(pp. 123-140). Like Bluff IL: Linguistic Assodadon of Canadaand the US.

Boyason-Bardies, B. de, Sagart, L, & Durand, C. (1984).Discernible differences in the babbling of infants according totarget language. Journal of Chid Language ,11, 1-15.

Boysson-Bardies, B. de, Sagest, L, Halle, P., & Durand, C. (1986).Acoustic investigations of cross-linguistic variability inbabbling. In B. Lindblom & R. Zetterstrom (Eds.), Precursorsof early speech (p p . 113-126). Basingstok s, Hampshire:MacMillan.

Brownian, C. P., & Goldstein, L (1985). Dynamic modeling ofphonetic structure. In V. Fromkin (Ed.), Phonetic linguistics (pp.35-53). New York: Academic.

Browman, C P., & Goldstein, L (1986). Towards an articulatoryphonology. Phonology Yearbook,3, 219-252.

Brownian, C P., & Goldstein, L (1987). Tiers in ArticulatoryPhonology, with some implications foe casual speech. Hoskinslaboretories Status Report on Speech Research, SR-92 (pp. 1-30). Toappear in J. Kingston & M. E. Beckman (Eds.), Papers inLaboratory Phonology 1: Between the Grammar and the Physics ofSpeech. Cambridge: Cambridge University Press.

Browman, C P., & Goldstein, L. (1988). Some notes on syllablestructure in articulatory phonology. Phonetic., 45,140.155.

Browman, C P., & Goldstein, L (1989). Targetless' schwa: Anarticulatory analysis. Paper presented at the Second Conferenceon Laboratory Phonology. Edinburgh, June 29-July 2,1989.

Browman, C P., & Goldstein, L. (in press). Gestural structuresand phonological patterns. In L C. Mattingly & M. Studdert-Kennedy (Eds.), Modularity and the motor theory of speechperception. Hillsdale, NJ: Lawrence Erlbaum Assodates.

Browman, C. P., Goldstein, L, Kelso, J. A. S., Rubin, P., &Saltzman, E. (1984). Articulatory synthesis from underlyingdynamics. Journal of the Acoustical Society of America, 75, S22-523.(Abstract)

Brownian, C P., Goldstein, L, Saltzman, E., & Smith, C. (1986).GEST: A computational model for speech production usingdynamically defined articulatory gestures. Journal of theAcoustical Society of America, 80, S97. (Abstract)

Brown, G. (1977). Listening to spoken English. London: Longman.Campbell, L. (1974). Phonological features: Problems and

proposals. Language, 50,52-65.Clifford, J. C. (1977). Fundamental problems in phonetics.

Bloomington: Indiana University Press.Chomslcy, N., & Halle, M. 4968). The sound pattern of English.New

York: Harper and Row.Clements, C. N. (1985). The geometry of phonological features.

Phonology Yearbook, 2, 225-252.Cements, C. N., & Keyser, S. J. (1983). CV phonology: A generative

theory of the syllable. Cambridge, MA: MIT Press.Ewen, C J. (1982). The internal structure of complex segments. In

H. van .ter Hulst & N. Smith (Eds.), The structure ofphonologicalrepresentations (pp. 27-67). Cinnaminson, NJ: Pods PublicationsU.S.A.

Ferguson, C. A., & Farwell, C. B. (1975). Wcrds and sounds inearly language acquisition. Language, 51, 419-439.

Fowler, C. A. (1980). Coarticulation and theories of extrinsictiming. Journal of Phonetics, 8,113-133.

Fowler, C A. (1981). A relationship between (*articulation andand compensatory snortening. Phonetic., 38, 35-50.

Fowler, C A. (1983). Converging sources of evidence on spokenand perceived rhythms of speech: Cyclic production of vowelsin sequences of monosyllabic stress feet. Journal of ExperimentalPsychology: Genera1,112, 386-412.

Fowler, C A., Rubin, P., Remez, R E., & Turvey, M. T. (1980).Implications for speech production of a general theory ofaction. In B. Butterworth (Ed.), language production (pp. 373-420). New York: Academic Press.

Fry D. B. (1966). The development of the phonological system inthe normal and the deaf child. In F. Smith & C. Miller (Eds.),The genesis of language: A psycholinguistk approach (pp. 187-206).Cambridge, MA: MIT Press.

Fujimura, 0. (1981a). Temporal organization of articulatorymovements as a multidimensional phrasal structure. Phonetics,38, 66-83.

Fujimura, 0. (1981b). Elementary gestures and temporalorganization-What does an articulatory constraint mean? InT. Myers, J. Laver, & J. Anderson (Eds.), The cognitiverepresentation of speech (pp. 101-110). Amsterdam: NorthHolland.

Gimson, A. C. (1962). An introduction to the pronunciation ofEnglish. London: Edward Arnold Publishers, Ltd.

Glass, L, & Mackey, M. C (1988). From docks to chaos. Princeton:Princeton University Press.

Goldstein, L, & Browman, C. P. (1986). Representation of voicingcontrasts using articulatory gestures. Journal of Phonetics, 14,339-342.

Gregores, E., & Suarez, J. A. (1967). A description of colloquialCuenvd The Hague: Mouton.

Guy, C. R. (1980). Variation in the group and the individual: Thecase of final stop deletion. In W. Labov (Ed.), Locating languagein tune and space (pp. 1-36). New York: Academic Press.

Haken, H., Kelso, J. A. S., & Bunz, H. (1985). A theoretical modelof phase transitions in human hand movements. BiologicalCybernetics, 51,347-356.

Halle, M. (1982). On distinctive features and their articulatoryimplementation. Natural language and Linguistic Theory, 1, 91-105.

Halle, M., & Stevens, K. (1971). A note on laryngeal features. MIT-RLE Quarterly Progress Report,11, 198-213.

Hammond, M. (1988). On deriving the well-formednesscondition. Linguistic Inquiry,19, 319-325.

Hardcastle, W. J., & Roach, P. J. (1979). An instrumentalinvestigation of coarticulation in stop consonant sequences. InP. Holden & H. Horner, (Eds.), Current issues in the phoneticsciences (pp. 531.540). Amsterdam: John Benjamin" AC.

Hayes, B. (1986). AssImllation as spreading in Toba Batak.Linguistic Inquiry,17, 467-499.

Jakobson, R., Fant, C. C. M., & Halle, M. (1969). Preliminaries isspeech analysis: The distinctive features and their sorrels. -a.Cambridge, MA: The MIT Press.

Jespersea, (..r. (1914). Lehrbuch der phonetik. Leipzig B. G. Teubner.Keating P. A. (1985). CV phonology, experimental phonetics, and

coarticulation. UCLA Working Papers in Phortetics,62, 1-13.Keating, P. A. (1988). Palatals as complex segments: X-ray

evidence UCLA Working Papers in Phonetics, 69, 77-91.Kelso, J. A. S., & Tuller, B. (1987). Intrinsic time in speech

production: Theory, methodology, and preliminaryobservations. In E. Keller & M. Copnik (Eds.), Motor and sensoryprocesses of language (pp. 203-222). Hillsda! e, NJ: Erlbaurn.

Kelso, J. A. S., Vatikiotis-Bateson, E., Saltzman, E. L, & Kay, B.(1985). A qualitative dynamic analysis of reiterant srech

Page 109: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

100 Brctornan and Goldstein

production: Phase portraits, kinematics, and dynamicmodeling. Journal of the Acoustical Society ofAmerica, 77,266-280.

Kent, R. D. (1983). The segmental organization of speech. In P.F.Mach (silage (Ed.), The production of speech (pp. 57-89). New YorkSpringer-Verlag.

Kent, P. D., & Murray, A. D. (1982). Acoustic feature of infantvocalic utterances at 3, 6, and 9 months. Journal of the AcousticalSociety of Aseerica,72, 353-365.

Kohler, K. (1976). Die Instabilitit wortfinaler Alveolarplosive intDeutschen: Eke elaktropalatographisdie Untersuchung [Theinstability of word-final alveolar plosives in German: Andectropalatographic investigation.] Phonetice,33, 1-30.

1Coopmans-van Belnum, F. J., & Van der Stelt, J. M. (1986). Ea: i ystages in the development of speech movements. In B.Lindblom & R. ZetterstrOm (Eds.), Precursors of early speech (pp37-50). Basingstoke, Hampshire: MacMillan.

Krakow, R. A. (1989). The articulatory organisation of syllables: Akinematic analysis of labial and velar gestures. Doctoraldissertation,Yale University.

Ladefoged, P. (1980). What are lingo Laic sounds made of?Language,56, 485 -502.

Ladefoged, P. (1982). A course in phonetics and ed.). New York:Harcourt Brace Jovanovich.

Ladefoged, P. (1988a). Hierarchical features of the InternationalPhonetic Alphabet. UCLA Working Papers in Phonetic:1,70, 1-12.

Ladefoged, P. (1988b). The many interfaces between phoneticsand pi inology. UCLA Working Papers it Phonetics, 70,13 -23.

Ladefoged, P.. & Halle, M. (1988). Some m tjor features of theInternational Phonetic Alphabet. Language, 64, 577-582.

Ladefoged, P., & Maddleson, I. (1986). Slane of the sounds of theworld's languages. UCLA Working Papers in Phonetics, 64,1 -137.

Lass, R. (1984). Phonology: An introduction to basic concepts.Cambridge: Cambridge University Press.

Liberman, A. M., & Mattingly, L G. (1985). The motor theory ofspeech perception revised. Copition,21, 1-36.

Lieberman, P. (1984). The biology and evolution of language.Cambridge: Harvard University Press.

Undau, M. (1978). Vowel features. Languags,54, 541-563.Lindblom, B. (1986). Phonetic universals in vowel systems. In J. J.

Ohala & J. Jaeger (Eds.), Exprimental phonology (pp. 13-44).Orlando: Academic Press.

Lindblom, B., MacNellage, P. F., & Studdert-Kennedy, M. (1983).Sell-organizing processes and the explanation of phonologicaluniversals. In B. Butterworth, B. Comrie & 0. Dahl (Eds.),Explanations of linguistic universals (pp. 182-203). The Hague:Mouton.

Linker, W. (1982). Articulatory and acoustic correlates of labialactivity in vowels: A cross- linguistic study. UCLA WorkingPapers in Phonetics,56,1-134.

Locke, J. L. (1983). Phonological acquisition and change. NewYork:Academic Press.

Locke, J. L (1986). The linguistic significance of babbling. In B.Lindblom & IL ZetterstrOm (Eds.), Precursors of early speech (pp.143.160). Basingstoke, Hampshire: MacMillan.

Lombardi, L (1987). On the representation of the affricate. Ms,University of Massachusetts, Amherst.

Lynch, J. (1983). On the Kuatan liquids. Language and Linguistics inMelenesia,14, 98-112.

Maddleson, I. (1987). Revision of the IPA: Linguo-labiab as a testcase. Journal of the International Phonetic Association,17, 26-30.

Maddleson, I., & Ladefoged, P. (1989). Multiply articulatedsegments and the feature hierarchy. UCLA Working Papers inPhonetics, 72,116-138.

Marshall, R. C., Candour, J., & Windsor, J. (1988). Selectiveimpairment of phonation: A case study. Brain an Language, 35,313-339.

McCarthy, J. J. (1988). Feature geometry aced dependency.Phonetic., 45, 84-108

McCarthy, J. J. (1989). Linear ordering in phonologicalrepresentation. Linguistic Inquiry,20, 71-99.

Nittrouer, S., Studdert-Kennedy, M., & AcCowan, R. S. (1989).The emergence of phonetic segments evidence from thespectral structure of fricative-vowel syllables spoken bychildren and adults. Journal of Speech and Hearing Restock, 32,120-132.

Nittrouer, S., Munhall, K., Kelso, J. A. S., Tulle:, B., & Harris, K. S.(1988). Patterns of interarticulator phasing and their relation tolinguistic structure. Journal of the Acoustical Society of America,84,1653 -1661.

Ohala, J. J. (1975). Phonetic explanations for nasal sound patterns.In C. A. Ferguson, L. M. Hyman, & J. J. Ohala (Eds.), Nasallest(pp. 289-316). Stanford: Si. nfoal University.

Ohala, J. J. (1981). The listener as a source of sound change. InC S. Masek, R. A. Hendrick, & M. F. Miller (Eds.), Papers fromthe parnsession on language and behavior (pp. 178-203). Chicago:Chicago Linguistic Society.

Ohala, J. J. (1983). The origin of sound patterns in vocal tractconstraints. In P. F. MacNeilage (Ed.), The production of speech(pp. 189-216). New York: Springer- Verlag.

°hale, J. J. (1985). Around 'flat.' In V. A. Fromkin (Ed.), PhoneticLinguistics: Essays in honor of Peter Ledefoged (pp. 223-241). NewYork: Academic Press.

Ohala, J. J., & Lorentz, J. (1977). The story of (w): An exercise Inthe phonetic explanation for sound patterns. Proceedings,Annual Meeting of the Berkeley Linguistic Society, 3, 577-599.

Oiler, D. K. (1986). Metaphonology and infant vocalizations. InB. Lindblom & R. Zettastrtim (Eds.), Precursors of early speech(pp. 21-36). Basingstoke, Hampshire: MacMillan.

Oiler, D. K., & Ellen, R. E. (1982). Similarity of babbling inSpanish- and English - learning babies.Journal of Child language,9, 565 -577.

Oiler, D. K. & Ellen, IL E. (1988). The role of audition in infantbabbling. Chid Development, 59,441-449.

Dairy, D. J. & Munhall, K. (1985). Control of rate and duration ofspeech movements. Journal of the Acoustical Society of America,77, 640-648.

Pachuca, W. (1982). Prolegomena to a theory of articulatory evolution.Doctoral dissertation, State University of New York at Buffalo.Ann Arbor, MI: University Microfilms International

Pike, K. L. (1943). Phonetics. Ann Arbor: University of MichiganPress.

Plotkin, V. Y. (1976). Systems of ultimate units. Phonetics, 33, 81-92.

Principles of the International Phonetic Assoziation (1949, reprinted1978). London University College.

Recasens, D. (in preparation). The articulatory characteristics ofalveolopolatal and palatal consonants. Haskins Laboratories.New Haven, CT.

Rubin, P., Baer, T., & Mermelstein, P. (1981). An articulatorysynthesizer for perceptual research. Journal of the AcousticalSociety of America, 70, 321-328.

Sagey, E. C (1986). The representation of features and relations in non-linear phonology. Unpublished doctoral dissertation, MIT.

Sagey, E. C (1988). On the ill-formednas of crossing associationlines. Linguistic Inquiry, 19, 109-118.

Saltzman, E. (1986). Task dynamic coordination of the speecharticulators A preliminary model. In H. Heuer & C. Fromm(Eds.), Generation and modulation of action patterns (pp. 129-144).(Experimental Brain Research Series 15). New York Springer-Verlag.

Saltzman, E., Goldstein, L., Browman, C. P., & Rubin, P. E. (1988).Dynamics of gestural blending during speech prodection.

8

Page 110: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Articulatory Gestures as Phonokgical Units 101

Presented at .1nnual International Neural Network Society(INNS) Boston, September 6. 10,1988.

Setzinta, E., & Kelso, J. A. S. (1987). Skilled actions: A taskdynamic approach. Psychological Review, 94,84 -106.

Saltzman, E., & K. Munhall (1989). A dynamical approach togestural patterning in speech production. Ecological Psychology.

Saltzman, E., Rubin, P. E., Goldstein, L, & Brownian, C P. (1987).Task-dynamic modeling of tnterarticulator coordination.Journal of the Acoustical Society of America, 82, S15. (Abstract)

Schadeburg, T. C. (1982). Nasalization in UMbundu. Journal ofAfrican Languages and Linguistks, 4,109-132

Schmidt, R. C., Carello, C., & Tureey, M. T. (1987). Visualcoupling of biological oscillators. PAW Realm, 2W, 22-24.

Selkirk, E. (1908). A two-root theory of length. Presented at the19th Meeting of the New England Linguistic Society (NELS),November 1988.

Shockey, L. (1974). Phonetic and phonological properties ofconnected speech. Ohio State hbrking Papers in Linguistics, 17,iv -143.

Steriade, D. (in press). Gestures and autosegments: Comments onBrownian and Goldstein's 'Gestures in ArticulatoryPhonology.' In J. Kingston & M. E. Beckman (Eds.), Papers inlaboratory phonology I: Between the grammar and the physics ofspeech. Cambridge: Cambridge University hem

Stevens, K. N. (1972). The quantal nature of speech: Evidencefrom articulatory-acoustic data. In E. E. David & P. B. Denes(Eds.), Human comommication: A unified view (pp. 51.66). NewYork McGraw HUI.

Stevens, K. N. (1989). On the quantal nature of speech. Journal ofPhonetics, 17,113-45.

Stevens, K. N., Keyser, S. J., & Kawasaki, H. (1986). Toward aphonetic and phonological theory of redundant features. In J. S.Perkell & D. FL Klatt (Eds.), Invariance and variability in speechprocesses (pp. 426-449). Hillsdale, NJ: Lawrence ErlbaumAssodates.

Straight, FL S. (1076). The acquisition of Maya phonology: Variation inYucatec child language. New York Garland.

Studdert-Kennedy, M. (1987). The phoneme as a perceptuomotorstructure. In A. Ailport, D. MacKay, W. Prinz & E. Scheme(Eds.), Language ;amnion and production (pp. 67-84). LondonAcademic Press.

Thelen, E. (1981). Rhythmical behavior In infancy: An ethologicalperspective. Developmental Psychology,17, 237-257.

Thompson, J. M. T., 1- c4wart, H. B. ;1976). Nonlinear dynamicsand chaos. New 114.4._ Jhn Wiley & Sons.

Threinsson, H. (1978). On the phonology of Icelandicpreespiration. Nordic Journal of Linguistics,1(1), 3-54.

Vatildotis-Batescn, E. (1988). Linguistic structure and articulatorydynamics. Bloomington: Indiana Linguistics Cub.

Vennemann, T., & Ladefoged, P. (1973). Phonetic features andphonological features. Ungua 32, 61-74.

Vihman, M. M. an press). Ontogeny of phonetic gestures: Speechproduction. In 1. G. Mattingly & M. Siuddert-Kennedy (Eds.),Modularity and the motor theory of spee: perception. Hillsdale, NJ:Lawrence Erlbaum Assodates.

Vihman, M. M., F. -.cken, M. A., Miller, R., Simmons, H., &Miller, 5. (1985). Prom babbling to speech: A re-assessment ofthe continuity issue. Language, 61, 397-445.

Wood, S. (1982). X-ray and model studies of vowel articulation.Working Papers in Linguistics, 23, Lund University.

FOOTNOTESPhonology, 6, 201- 151(1989).

t Also, Department of Linguistics, Yale University.1 A simple dynamical system consists of a mass attached to the

end of a spring-a damped mass - sprints model. If the mass ispulled, stretching the spring beyond its rest length (equilibriumposition), and then released, the system will begin to oscillate.The resultant movement patterns of the mass will be a dampedsinusoid described by the solution to the equation below. Whensuch an equation is used to model the movements of coordinatedsets of articulators, the 'object' -motion variable-in theequation is considered to be the tract variable, for example, lipaperture (LA). Thus, the sinusoids trajectory would describehow lip aperture changes over time:

mi+ bt+ kfz- x0) si 0

where m mass of the object, b damping of the system, kstiffens of the spring, AO= rest length of the spring (equilibriumposition), x instantaneous displacement of the object, *instantaneous velocity of the object, z os instantaneousacceleration of the object.

2 For ease of reference to the gesture, it is possible to use either abundle of gestural descriptors (or a selected subset) or gesturalsymbols. We have been trying different approaches to thequestion: of what gestural symbols should be; our present bestestimate is that gestures should be treated like archiphonemes.Thu- our current proposal is to use the capitalized form of thevoiced IPA symbol for oral gestures, capitalized and dlaaittzed(H) for glottal gestures, and (±N) for (vette do) and (velic open]gestures, respectively. In order to dearly distinguish gesturalsymbols from other phonetic symbols, we endose them in curlybrackets: ( ). This approach should permit gestural descriptionsto draw upon the full symbol resources of IPA, rathn thanattempting to develop an additional set of symbols. However,we welcome comments from others on this decision, particularlyon the proposal to capitalize gestural symbols, a choice th..minimizes confusion with phonemic transcriptions-but thatleads to a conflict with uvular symbols in the current IPAsystem.

109

Page 111: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Haskins La/ Jrator 'et States 1..,-nrt on Speech Research1989, ski al ma 10U '7

The Perception of Phonetic Gestures*

Carol A. Fowlert and Lawrence D. Rosenblumtt

We have titled our presentation 'The perception of phonetic gestures* as if phoneticgestures are perceived. By phonetic gestures we refer to organized movements o: one ormore vocal -tract structures that realize phonetic dimensions of an utterance (cf. Browman& Goldstein, 1986; in press a). An example of a gesture is bilabial closure for a stop, whichincludes contributions by the jaw and the upper and lower lips. Gestures are organizedinto larger segmental and suprasegmental groupings, and we do not intend to imply thatthese larger organizations are not perceived as well. We focus on gestures to emphasize aclaim that, in speech, perceptual objects are fundamentally articulatory as well aslinguistic.

That is, in speech perception, articulatory events have a status quite different from that oftheir acoustic products. The former are perceived, whereas the latter are the means (or oneof the means) by which they are perceived.

A claim that phonetic gestures are perceived is not uncontroversial, of course, and thereare other points of view (e.g., Massaro, 1987; Stevens & Blunistein, 1981). We do notintend to consider these other views here, however, but instead to focus on agreements anddisagreements between two theoretical perspectives from which the claim is made.Accordingly, we begin 14 summarizing some of the evidence that, in our view, justifies it.

Phonetic gestures are perceived:The sources of evidence

1. Correspondence failures between acousticsignal and percept: Correspondencesbetween gestures and perceptPerhaps the most compelling evidence that

gestures, and not their acoustic products, areperceptual objects is the failure of dimensions ofspeech percepts to correspond to obviousdimensions of the acoustic signal and theircorresponuence, instead, to phonetically-organizedarticulatory behaviors that produce the signal. Weoffer three examples, all of them implicatingarticulatory gestures as perceptual objects and thethird showing most clearly that the perceived

Preparation of this manuscript was supported by afellowship from the John Simon Guggenheim MemorialFoundation to tLo first author and by grants N111-NICHDHD-01994 and NINCDS NS-13617 to Haskins Laboratories.

102

gestures are not surface articulatory movements,but rather, linguistically-organized gestures.

a. Synthetic /di/ and /du/One example from the early work at Haskins

Laboratories (Liberman, Cooper, Shankweiler, &Studdert-Kennedy, 1967) is of synthetic /di/ and/du/. Monosyllables, such as those in Figure 1, canbe synthesized that consist only of two formants.The information specifying /d/ (rather than /b/ or/g/) in both syllables is the second formanttransition. These transitions are very different inthe two syllables, and, extracted from theirsyllables, they sound very different too. Eachsounds more-or-less like the frequency glide itresembles in the visible display. Neither soundslike /d/. In the context of their respective syllables,however, they sound alike and they sound like /d/.

The consonantal segments in /di/ and /du/ areproduced alike too, by a constriction and releasegesture of the tongue tip against the alveolar ridgeof the palate. When listeners perceive the

110

Page 112: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

The Perception of Phonetic Gestures 103

synthetic /di/ and /du/ syllables of Figure 1, theirpercepts correspond to the implied constrictionand release gestures, not, it seems, to the context-sensitive acoustic signal.

14 IiC

I1 t

3000

2400

1800

1200

800

0

.I.

\1111=Mb

dl du

Figure 1. Syn'hetic syllables /di/ and /du/. The secondformant transitions identify the initial consonant as /d/rather than as /b/ or /g/.

b. Functional equivalence of acoustic"cues"

We expect listeners to be very good atdistinguishing an interval of silence fromnonsilencefrom a set of frequency glides, forexample. Too, we expect them to distinguishacoustic signals that differ in two ways morereadily than signals that differ in just one of thetwo ways. Both of these expectations are vio-latedanother example of noncorrespondenceifthe silence and glides are joint acoustic productsof a common constriction and release gesture for astop consonant.

Fitch, Halwes, Erickson and Liberman (1980)created synthetic syllables identified as "slit" or"split" by varying the duration of a silent intervalfollowing the fricative and manipulating the pres-ence or absence of transitions for a bilabial stopfollowing the silent interval. A relatively long sc.lent interval and the presenc of transitions bothsignal a bilabial stop, the silent interval cuing theclosure and the transitions the release. Fitch et al.found that pairs of syllables differing on both cuedimensions, duration of silence and pres-ence/absence of transitions were either morediscriminable than pairs differing in one of theseways or less discriminable depending on how thecues were combined. A syllable with a long silentinterval and transitions was highly discriminablefrom a syllable with a shorter silent interval andno transitions; the one was identified as "split"and the other as "slit." A syllable with a short

silent interval and transitions was nearlyindiscriminable from one with a longer intervaland no transitions; both were identified as "split."Syllables differing in two ways are indiscrim-inable just when the acoustic cues that distinguishthem are "functionally equivalent"that is, theycue the same articulatory gesture. A long silentinterval does not normally sound like a set offrequency glides, but it does in a context in whicheach specifies a consonantal constriction.

c. Perception of intonationThe findings just summarized, among others,

reveal that listeners perceive gestures.Apparently, listeners do not perceive the acousticsignal per se.

Nor, however, do they perceive "raw"articulatory motions as such. Rather, theyperceive linguistically-organized (phonetic)gestures. Research on the various ways in whichfundamental frequency (henceforth, fo) isperceived shows this most clearly.

Perceived intonational peak height will not, ingeneral, correspond to the absolute rate at whichthe vocal folds open and close during production ofthe peak. Instead, perception of the peakcorresponds to just those influences on the rate ofopening and closing that are caused by gesturesintended by the talker to affect intonational peakheight. (Largely, intonational melody isimplemented by contraction and relaxation ofmuscles of the larynx that tense the vocal folds;see, e.g., Ohala, 1978.) There are other influenceson the rate of vocal fold opening and closing thatmay either decrease or increase fo. Some of theseinfluences, due to lung deflation during anexpiration (*declination,' Gelfer, Harris, it Baer,1987; Gelfer, Harris, Collier, & Baer, 19861 or tosegmental perturbations reflecting vowel height(e.g., Lehiste & Peterson, 1961) and obstruentvoicing (e.g., Ohde, 1984), are largely or entirelyautomatic consequences of other things thattalkers are doing (producing an uttr-ance on anexpiratory airflow, producing a close or open vowel[Honda, 1981], producing a voiced or voicelessobstruent [Ohde, 1984]; Lofqvist, Baer, McGarr, &Seider Story, in press). They do not sound likechanges in pitch; rather, they sound like whatthey are: information for early-to-late serialposition in an utterance in the case of declination(Pierrehumbert, 1979; see also Lehiste, 1982), andinformation for vowel height (Reir:tolt Peterson,1986; Sili.erman, 1987) or consonant voicing (e.g.,Silverman, 1086) in the case of segmentalperturbations.

111

Page 113: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

104 Fowler and Rosenblum

As we will suggest below (under "How acousticstructure may serve as information for speechperceivers"), listeners apparently use config-urations of changes in different acoustic variabl3sto recover the distinct, organized articulatorysystems that implement the various linguisticdimensions of talkers' utterances. By usingacoustic information in this way, listeners canrecover what Liberman (1982) has called thetalker's "phonetic intents.'2. Audio-visual integration of gesturP1

informationA video display of a face mouthing /ga/

synchronized with an acoustic signal of thespeaker saying /ba/ is heard most typically as "da"(MacDonald & McGurk, 1978). Subjects'identifications of syllables presented in this typeof experiment reflect an integration of informationfrom the optical and acoumic sources. Too, asLiberman (1982) point,* out, the integration affectswhat listeners experience hearing to an extentthat they cannot tell what contribution to theirperceptual experience is made by the acousticsignal and what by the video display.'

Why does integration occur? One answer is thatboth sources of information, the optical and theacoustic, provide information apparently aboutthe same event of talking, and they do so byproviding information about the talkers' phoneticgestures.

3. ShadowingListeners' latercy to repeat a syllable they hear

is very shortin Porter's research (Porter, 1978;Porter & Lubker, 1980), around 180 ms onaverage. Even though these latencies are obtainedin a choice reaction time procedure (in which thevocal response required is different for differentstimuli to respond), latencies approach simplereaction times (in which the same response occursto any stimulus to respond), and they are muchshorter than choice ,eaction times using a buttonpress.

Why should these particular choice reactiontimes be so fast? Presumably, the compatibilitybetween stimulus and response explains the fastresponse times. Indeed, it effectively eliminatesthe element of choice. If listeners perceive thetalker's phonetic gestures, then the only responserequiring essentially no choice at all is one thatreproduces those gestures.

The motor theoryThroughout most of its history, the motor theory

(e.g., Liberman, Cooper, Harris, & MacNeilage,

1963; Liberman et al., 1967; Liberman &Mattingly, 1985; see also, Cooper, Delattre,Liberman, Borst, & Gerstmen, 1952) has been theonly theory of speech perception to identify thephonetic gesture as an object of perception. Herewe describe the motor theory by discussing what,more precisely, the motor theorists haveconsidered to be the object of perception, how theycharacterize the process of speech perception andwhy, recently, they have introduced the idea thatspeech perception is accomplished by a specializedmodule.

What is perceived for the motor theorist?Coarticulation is the reason why the acoustic

signal appears to correspond so badly to thesequences of phonemes that talkers intend toproduce. Due to coarticulation, phonemes areproduced in overlapping time frames so that theacoustic signal is everywhere (or nearlyeverywhere; see, e.g., Stevens & Blumstein, 1981),context-sensitive. This makes the signal a complex"code" on the phonemes of the language, not acipher, like an alphabet.2 In 'Perception of thespeech code" (1967), Liberman and his colleaguesspeculated that coarticulatory "encoding" is, inpart, a necessary consequence of properties of thespeech articulators (their sluggishness, forexample). However, in their view, coarticulation isalso promoted both by the nature of phonemesthemselvesthat they are realized by sets ofsubphonemic featuresa and by the listener'sshort-term memory, which would be overtaxed bythe slow transmission rate of an acoustic cipher.

In producing speech, talkers exploit the fact thatthe different articulatorsthe lips, velum, jaN.-,etc.can be independently controlled.Subphonemic features, such as lip rounding,velum lowering and alveolar closure each usesubsets of the articulators, often just one;therefore, more than one feature can be producedat a time. Speech can be produced at rapid ratesby allowing aparal1-1 transmission" of thesubphonemic features of different phonemes. Thisincreases the transmission rates for listeners, butit also creates much of the encoding that isconsidered responsible for the apparent lack ofinvariance between acoustic and phoneticsegments.

The listener's percept corresponds, it seems,neither to the encoded cues in the acoustic signalnor even to the also-encoded succession of vocaltract shapes during speech production, but insteadto a sequence of discrete, unencoded phonemes,each composed of its own comporent subphonemic

112

Page 114: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

The Permtion of Phonetic Gestures 105

features. To explain why "perception mirrorsarticulation more closely than sound" (p. 453) and(yet) achieves recovery of discrete unencodedphonemes, the motor theorists proposed as a firsthypothesis that perceivers somehow access theirspeech-motor systems in perception and that thepercept they achieve corresponds to a stage inproduction before encoding of the speech segmentstakes place. In 'Perception of the speech code," thestage was one in which "motor commands" to themuscles were selected to implement subphonemicfeatures. In "The motor theory revised,"(Liberman & Mattingly, 1985), a revision to thetheory reflects developments in our understandingof motor control. Evidence suggests that activitiesof the vocal tract are products of functionalcouplings among articulators (e.g., Folkins &Abbs, 1975, 197E; Kelso, Tul lei, Vatikiotis-Bateson, & Fowler, 1984), which produce gesturesas defined earlier, not independent movements ofthe articulators identified with subphonemicfeatures in "Perception of the spe-lch code.* In"The motor theory revised," control structures forgestures have replaced motor commands forsubphonemic features as invariants of productionand as objects of perception for listeners.4 Likesubphonemic features, control structures areabstract, prevented by coarticulation from makingpublic appearances in the vocal tract. Libermanand Mattingly write of the perceptual objects ofthe revised theory:

We would argue, dm, that the gestures do havecharacteristic invariant properties, as the motortheory requires, though these must be seen, not asperipheral movements, but as the more remotestructures that control the movements. Thesestructures correspond to the speaker's intentions.(p. 23)

In recovering abstract gestures, processes ofspeech perception yield quite different kinds ofperceptual objects than general auditoryperception. In auditory perception, moregenerally, according to Liberman and Mattingly,listeners hear the signal as "ordinary sound"(p. 6); that is, they hear the acoustic signal assuch. In other publications, Mattingly andLiberman (1988) refer to this aprirently morestraightforward perceptual object as" homomorphic" in contrast to objects of speechperception which are "heteromorphic." Anexample they offer of homomorphic auditoryperception is perception of isolated formanttransitions which sound like the frequency glidesthey resemble in a spectrographic display.

How perception takes place in the motortheory

In the motor theory, listeners use "analysis bysynthesis" to recover phonetic gestures from theencoded, informationally-impoverished acousticsignal. This aspect of the theory has never beenworked out in detail. However, in general,analysis by synthesis consists in analyzing asignal by guessing how the signal might have beenproduced (e.g., Stevens, 1960; Stevens & Halle,1964). Liberman and Mattingly refer to an"internal, innately specified vocal-tractsynthesizer that incorporates completeinformation about the anatomical andphysiological characteristics of the vocal tract r,-Icialso about the articulatory "id acousticconsequences of linguistically significant gestures"(p. 26). The synthesizer computes candidategestures and then determines which of thosegestures, in combination with others identified asongoing in the vocal-tract could account for theacoustic signal.

Speech pet leption as modularIf speech perception does involve accessing the

speech-motor system, then it must indeed bespecial and quite distinct from genera, auditoryperception. It is special in its objects of perception,in the kinds of processes applied to the acousticsignal, and presumably in the neural systemsdedicated to those processes as well. Libermanand Mattingly propose that speech perception isachieved by a specialized module.

A module (Fodor, 1983) is a cognitive systemthat tends to be narrowly specialized ("domainspecific"), using computations that are special("eccentric") to its domain; it is computationallyautonomous (so that different systems do notcompete for resources) and prototypically isassociated with a distinct neural substrate. Inaddition, modules tend to be "informationallyencapsulated," bringing to bear on the processingthey do only some of the relevant information theperceiver may have; in particular, processing of"input" (perceptual) systemsprime examples ofmodulesis protected early on from bias by "top-down" information.

The speech perceptual system of the motortheory has all of these characteristics. It isnarrowly specialized and its perception-productionlink is eccentric; moreover, it is associated with aspecialized neural substrate (e.g., Kimura, 1961).In addition, as the remarkable phenomenon ofduplex perception (e.g., Liberman, Isenberg, &Rakerd, 1981; Rand, 1974) suggests, the speech

113

Page 115: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

106 Fowler and Rosenblum

perceiving system is autonomous andinformationally encapsulated.

In duplex perception as it is typicallyinvestigated (e.g., Liberman et al., 1981; Mann &Liberman, 1983; Repp, Milburn, & Ashkenas,1983), most of an acoustic CV syllable (the "base"at the left of Figure 2) is presented to one earwhile the remainder, generally a formanttransition (either of the "chirps" on the right sideof Figure 2) is presented to the other ear. Heard inisolation, the base is ambiguous between "da" andage," but listeners generally report hearing "da."(It was identified as "da" 87% of the time in thestudy by Repp et al., 1983.) In isolation, the chirpssound like the frequency glides they resemble;they do not sound speech-like. Presenteddichotically, listeners integrate the chirp and thebase, hearing the integrated "da" or alga* in the earreceiving the base. Remarkably, in addition, theyhear the chirp in the other ear. Researchers whohave investigated duplex perception describe it asperception of the same part of an acoustic signalin two ways simultaneously. If that character-ization is correct, it implies strongly that thepercepts are outputs of two distinct andautonomous perceptual systems, one specializedfor speech and the other perhaps general to otheracoustic signals.

III

4

3

2

1

0

A striking characteristic of speech perceptualsystems that integrate syllable fragmentspresented to different ears is their imperviousnessto information in the spatial separation of thefragments that they cannot possibly be part of thesame spoken syllablean instance, perhaps, ofinformation encapsulation.

In recent work, Mattingly and Liberman (inpress) have revised, or expanded on, Fodor's viewof modules by proposing a distinction between"closed" and "open" modules. Closed modules,including the speech module and a sound-localization module, for example, are narrowlyspecialized as Fodor has characterized moduli*more generally. In addition (among other specialproperties), they yield heteromorphic perceptsthat is, percepts whose dimensions are not thoseof the proximal stimulation. Although Mattinglyand Liberman characterize the heteromorphicpercept in this wayin terms of what it does notconform to, it appears that the heteromorphicpercept can be characterized in a more positiveway as well. The dimensions of heteromorphicpercepts are those of distal events, not of proximalstimulation. The speech module renders phoneticgestures; the sound-localization module renderslocation in space. By contrast, open modules aresort of "everything-else" perceptual systems.

Normal syllables

0 50 250Time (ms)

Duplex-producing syllables

'Ilk00001MINMIIIMIMINII

Base

.4, "d" transitionor4-- "9" transition

Transitions

Figure 2. Stimuli that yield duplex perception. The base is presented to one ear and the third formants to another. Inthe ear to which the base is presented, listeners hear the syllable specified jointly by the base and the transitions; inthe other ear, they hear the transitions as frequency glides. (Figure adapted from Whalen itc Liberman, 1987).

114

Page 116: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

The Perception of Phonetic Gestures 107

An open auditory-perceptual module isresponsible for perception of most sounds in theenvironment. According to the theory, outputs ofopen modules are homomorphic.

In the context of this account of auditoryperception, the conditions under which duplexperception is studied are seen as somehowtricking the open module into providing a perceptof the isolated formant transition even though thetransition is also being perceived by the speechmodule. Accordingly, two percepts are provided forone acoustic fragment; one percept ishomomorphic and the other is heteromorphic.

ProspectusOur brief overview of the motor theory obviously

cannot do justice to it. In our view, it is, to date,superior to other theories of speech perception inat least two major respects: in its ability to handlethe full range of behavioral findings on speechperceptionin particular, of course, the evidencethat listeners recover phonetic gesturesandin-having developed its account of speech in thecontext of a more general theory of biologicalspecializations for perception.

Our purpose here, however, is not just to praisethe theory, but to challenge it as well, with thefurther aim of provoking the motor theoristseither to buttress their theory where it appears tous vulnerable, or else to revise it further.

We will raise three general questions from theperspective of our own, direct-realist theory(Fowler, 1986a,b; Rosenblum, 1987). First wequestion the inference from evidence thatlisteners recover phonetic gestures that thelistener's owr speech-motor system plays a role inperception. The nature of the challenge we mountto this inference leads to a second one. Wequestion the idea that, whereas a specializedspeed moduleand other closed modulesrender heteromorphic percepts, other percepts arehomomorphic. Finally, we challenge the idea inany case that duplex perception reveals thatspeech perception is achieved by a closed module.

Standing behind all of these specific questionswe raise about claims of the motor theory is ageneral issue that ne3ds to be confronted by all ofus who study speecN perception and, for thatmatter, perception more generally. The issue isone of determining when behavioral data warrantinferences being drawn about perceptualprocesses taking place inside perceivers and whenthe data deserve accounting instead in terms ofthe nature of events taking place publicly whensomethin; is perceived.

Does perceptual recovery of phoneticgestures implicate the listener's speech

motor system?

In our view, the evidence that perceivers recoverphonetic gestures in speech perception isincontrovertible6 and any theory of speechperception is inadequate unless it can provide aunified account of those findings. However, themotor theorists have drawn an inference fromthese findings that, we argue, is not warranted bythe general observation that listeners recovergestures. The inference is that reccvery ofgestures implies access by the perceiver to his ownspeech-motor system. It is notable, perhaps, that,in neither "Perception of the speech code" nor "Memotor theory revised," do Liberman and hiscolleagues offer any evidence in support of thisclaim except evidence that listeners recovergestures (and that human left-cerebralhemispheres are specialized for speech andespecially for phonetic perception [e.g., Kimura,1961; Liberman, 1974; Studdert-Kennedy &Shankweiler, 1970]).

There is another way to explain why listenersrecover phonetic gestures. It is that phoneticgestures are among the "distal events" that occurwhen speech is perceived and that perceptionuniversally involves recovery of distal events frominformation in proximal stimulation.

Distal events universally are perceptualobjects: Proximal stimuli universallyarenot.

Consider first visual perception observed fromoutside the perceiver.6 Visual perceivers recoverproperties of objects and events in theirenvironment ("distal events"). They can do so, inpart, because the environment suppliesinformation about the objects and events in a formthat their perceptual systems can use. Lightreflects from objects and events, which structure itlawfully; given a distal event and light from somesource, the reflected light must have the structurethat it has. To the extent that the structure in thelight is also specific to the properties of a distalevent that caused it, it can serve as information toa perceiver about its distal source. The reflectedlight ("proximal stimulation ") has anotherproperty that permits it its central role inperception. It can stimulate the visual system of aperceiver and thereby impart its structure to it.From there, the perceiver can use the structure asinformation for distal-event perception.

115

Page 117: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

108 Fowler and Rosenblum

The reflected light does not provide informationto the visual system by picturing the world.Information in reflected light for "looming" (thatis, for an object on a collision course with theperceiver's head), for example, is a certain mannerof expansion of the contours of the object'sreflection in the light, that progressively coversthe contours of optical reflections of immobileparts of the perceiver's environment. When anobject looms, it does not grow; it approaches.However, its optical reflection grows, and,confronted with such an optic array, perceivers(from fiddler crabs to kittens to rhesus monkeys tohumans [Schiff, 1965; Schiff, Caviiiess, & Gibson,1962]) behave as if they perceive an object on acollision course; that is, they try to avoid it.

Two related conclusions from thischaracterization of visual perception are first thatobservers see distal events based on informationabout them in proximal stimulation and secondthat, in Mattingly and Liberman's terms, visualperception therefore is quite generallyheteromorphic. It is not merely heteromorphic inrespect to those aspects of stimulation handled byclosed modules (for example, one that recoversdepth information from binocular disparity); it isgenerally the case that the dimensions of thepercept correspond with dimensions of distalobjects and events and not necessarily with thoseof a distal-event-free description of the proximalstimulation?

Auditory perception is analogous to visualperception in its general character, viewed, onceagain from outside the perceiver. Consider anysounding object, a ringing bell, for example. Theringing bell is a "distal event" that structures anacoustic signal. The structuring of the air by thebell is lawful and, to the extent that it also tendsto be specific i.0 its distal source, the structure canprovide information about the source to asensitive perceiver. Like reflected light, theacoustic signal (the proximal stimulation) in facthas two critical properties that allow it to play acentral role in perception. It is lawfully structuredby some distal event and it can stimulate theauditory system of perceivers, thereby impartingits structure to it. The perceiver then can use thestructure as information for its source.

As for structure in reflected light, structure inan acoustic signal does not resemble the sound-producing source in any way. Accordingly, ifauditory perception works similarly to visualperceptionthat is, if perceivers use structure inacoustic signals to recover their distal sources,

then auditory percepts, like visual percepts will beheteromorphic.

Liberman and Mattingly (1985; Mattingly &Liberman, 1988) suggest, hcwever, that ingeneral, auditory perceptions are homomorphic.We agree that our intuitions are less clear herethan they are in the case of visual perception.However, it is an empirical question whetherdimensions of listeners' percepts are betterexplained in terms of dimensions of distal eventsor of a distal-event free description of proximalstimulation. To date the question is untested,however; for whatever reason, researchers whostudy auditory perception rarely study perception3f natural sound-producing events (see, however,Repp, 1987; VanDerVeer, 1979; Warren &Verbrugge, 1984).

Now consider speech perception. In speech, thedistal eventat least the event in theenvironment that structures the acoustic speechsignalis the moving vocal tract. If, as wepropose, the vocal tract produces phoneticgestures, then the distal event is, at the sametime, the set of phonetic gestures that compose thetalker's spoken message. The proximal stimulus isthe acoustic signal, lawfully structured bymovement in the vocal tract. To the extent thatthe structure in the signal also tends to be specificto the events that caused it, it can serve asinformation about those events to sensitiveperceivers. The information that proximalstimulation provides will be about the phoneticgestures of the vocal tract. Accordingly, if speechperception works like visual perception, thenrecovery of phonetic gestures is not eccentric anddoes not require eccentric processing by a speechmodule. It is, instead, yet another instance ofrecovery of distal events by means of lawfully-generated structure in proximal stimulation.

The general point we hope to make is that,arguably, all perception is heteromorphic, withdimensions of percepts always corresponding tothose of distal events, not to distal-event freedescriptions of proximal stimuli. Speech is notspecial in that regard. A more specific point is thateven if evidence were to show that speechperceivers do access their speech-motor systems,that perceptual process would not be needed toprovide the reason why listeners' percepts areheteromorphic. The reason percepts areheteromorphic is that perceivers universally useproximal stimuli as information about eventstaking place in the world; they do not use them asperceptual objects per se.

116

Page 118: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

The Perception of Phonetic Gestures 109

Are phonetic gestures public or private?

Although in "Perception of the speech code" and"The motor theory revised," evidence thatlisteners recover gestures is the only evidencecited in favor of the view that perceivers accesstheir speech mot-ar systems, that evidence is notthe only reason why the motor theorists and othertheorists invoke a construct inside the perceiverrather than the proximal stimulation outside toexplain why the percept has the character it does.A very important reason why, for the motortheorists, the proximal stimulation is not by itselfsufficient to specify phonetic gestures is that, intheir view, phonetic gestures are abstract controlstructures corresponding to the speakersilitentions, but not to the movements actuallytaking place in the vocal tract. If phoneticgestures aren't "out there" in the vocal tract, thenthey cannot be analogous to other distal events,because they cannot, themselves, lawfullystructure the acoustic signal.

In our view, this characterization of phoneticgestures is mistaken, however. We can identifytwo considerations that appear to support it, butwe find neither convincing. One is that anygesture of the vocal tract is merely a token action.Yet perceivers do not just recognize the token,they recognize it as a member of a largerlinguistically-significant category. That seems tolocalize the thing perceived in the mind of theperceiver, not in the mouth of the talker. Morethan that, the same collections of token gesturesmay be identified as tokens of different categoriesby speakers of different languages. (So, forexample, speakers of English may identify avoiceless unaspirated ..kop in stressed syllable-initial position as a /b/, whereas speakers oflanguages in which voiceless unaspirated stopscan appear stressed-syllable initially may identifyit as an instance of a /p/.) Here, it seems, theinformation for category membership cannotpossibly be in the gestures themselves or in theproximal stimulation; it must be in the head of theperceiver. The second consideration is thatcoarticulation, by most accounts, preventsnondestructive realization of phonetic gestures inthe vocal tract. We briefly address bothconsiderations.

Yet another analogy: There are chairs in theworld that do not look very much like prototypicalchairs. Recognizing them as chairs may requirelearning how people typically use them (learningtheir "proper function" in Millikan's terms [1984]).By most accounts, learning involves some

enduring change inside the perceiver. Notice,however, that even if it does, what makes thetoken chair a chair remains its properties and itsuse in the world prototypically as a chair. Too,whatever perceivers may learn about that chairand about chairs in general is only what theylearn; the chair itself and the means by which itstype-hood can be identified remain unquestionablyout there in the world. Phonetic gestures andphonetic segments are like chairs (in this respect).Token instances of bilabial closure are members ofa type because the tokens all are products of acommon coupling among jaw and lips realized inthe vocal tract of talkers who achieve bilabialclosure. Instances of bilabial closure in stressed-syllable-initial position that have a particulartiming relation to a glottal opening gesture aretokens of a phonological category, /b/, in somelanguages and of a different category, /p/, inothers because of the different ways that they aredeployed by members of the different languagecommunities. That differential deployment iswhat allowed descriptive linguists to identifymembers of phonemic categories as such, andpresumably it is also what allows languagelearners to acquire the phonological categories oftheir native language. By most accounts, whenlanguage learners discover the categories of theirlanguage, the learning involves enduring changesinside the learner. However, even if it does, it isno more the case that the phonetic gestures or thephonetic segments move inside the mind than it isthat chairs move inside when we learn how torecognize them as such. What we have learned iswhat we know about chairs and phoneticsegments; it is not the chairs or the phoneticsegments themselves. They remain outside.

Turning to coarticulation, it is described in themotor theory as "encoding," by Ohala (e.g., 1981)as "distortion," by Deniloff and Hammarberg(1973) as "assiniiiation" and by Hockett (1955) as"smashing" and "rubbing together" of phoneticsegments (in the way that raw eggs would besmashed and rubbed together were they sentthrough a wringer). None of thesecharacterizations is warranted, however.Coarticulation may instead be characterized asge stural layeringa temporally staggeredrealization of gestures that sometimes do andsometimes do not share one or more articulators.

In fact, this kind of gestural layering occurscommonly in motor behavior. When someonewalks, the movement of nis or her arm is seen aspendular. However, the surface movement is acomplex (layered) vector including not only the

1

Page 119: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

110 Fowler and Rosenblum

swing of the arm, but also movement of the wholebody in the direction of locomotion. This layeringis not described as "encoding," "distortion" or evenas assimilation of the arm movement to themovement of the body as a whole. And for goodreason; that is not what it is. The movementreflects a convergence of forces of movement on abody segment. The forces are separate for thewalker, information in proximal stimulationallows their parsing (Johansson, 1973), andperceivers detect their separation.

There is evidence already suggesting that atleast some of coarticulation is gestural layering(Carney & Moll, 1971; ehman, 1966; also seeBrownian & Goldstein, in press a), not encoding ordistortion or assimilation. There is also convincingevidence that perceivers recover separate gesturesmore-or-less in the way that Johansson suggeststhey recover separate sources of movement of bodysegments in perception of locomotion. Listenersuse information for a coarticulating segment thatis present in the domain of another segment asinformation for the coarticulating segment itself(e.g., Fowler, 1984; Mann, 1980; Whalen, 1984);they do not hear the coarticulated segment asassimilated or, apparently, as distorted or encoded(Fowler, 1981; 1984; Fowler & Smith, 1986).

Our colleagues Catherine Browman and LouisGoldstein (1985, 1986, in press a,b) have proposedthat phonetic primitives of languages are gestural,not abstract featural. Our colleague ElliotSaltzman (1986; Saltzman & Kelso, 1987; see also,Kelso, Saltzman, & Tuner, 1986) is developing amodel that implements phonetic gestures asfunctional couplings among the articulators andthat realizes the gestural layering characteristicof coarticulation. To the extent that theseapproaches both succeed, they will show thatphonetic gesturesspeakers' intentionscan berealized in the vocal tract nondestructively, andhence can structure acoustic signals directly.

Do listeners need an innate vocal tractsynthesizer to recognize acoustic reflections ofphonetic gestures? Although it might seem tohelp, it cannot be necessary, because there is noanalogous way to explain how observers recognizemost distal events from their optical reflections.Somehow the acoustic and optical reflections of asource must identify the source on their own. Insome instances, we begin to understand themeans by which acoustic patternings can specifytheir gestural sources. We consider one suchinstance next.

How acoustic structure may serve asinformation for gestures.

We return to the example previously describedof listeners' perception of those linguisticdimensions of an utterance that are cued in someway by variation in fo. A variety of linguistic andparalinguistic properties of an utterance haveconverging effects on fo. Yet listeners pull apartthose effects in perception.

What guides the listeners' factoring ofconverging effects of fo? Presumably, it is theconfiguration of acoustic products of the severalgestures that have effects, among others, on fo.Intonational peaks are local changes in an focontour that are effected by means that, to a firstapproximation, only affect fo; they are produced,largely, by contraction and relaxation of musclesthat stretch or shorten the vocal folds (e.g., Ohala,1978). In contrast, declination is a global changein fo that, excepting the initial peak in a sentence,tracks the decline in subglottal pressure (Gelfer etal., 1985; Gelfer et al., 1987). Subglottal pressureaffects not only fo, but amplitude as well, andseveral researchers have noticed that amplitudedeclines in parallel with fo and resets when foresets at major syntactic boundaries (e.g.,Breckenridge, 1977; Maeda, 1976). The paralleldecline in amplitude and fo constitutesinformation that pinpoints the mechanism behindthe fo declinegradual lung deflation,incompletely offset by expiratory-muscle activity.That mechanism is distinct from the mechanismby which intonational peaks are produced.Evidence that listeners pull apart the two effectson fo (Pierrehumbert, 1979; Silverman, 1981suggests that they are sensitive to the distinctgestural sources of these effects on fo.

By the same token, fo perturbations due toheight differences among vowels are not confusedby listeners with information for intonationalpeak height even though fo differences due tovowel height are local, like intonational peaks,and are similar in magnitude to differences amongintonational peaks in a sentence (Silverman,1987). The mechanisms for the two effects on foare different, and, apparently, listeners aresensitive to that. Honda (1981) shows a strongcorrelation between activity of the genioglossusmuscle, active in pulling the root of the tongueforward for high vowels, and intrinsic fo of vowels.Posterior fibers of the genioglossus muscle insertinto the hyoid bone of the larynx. Therefore,contraction of the genioglossus may pull the hyoid

118

Page 120: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

The Perception of Phonetic Gestures 111

forward, rotating the thyroid cartilage to whichthe vocal folds attach, and there/ -, may stretchthe vocal folds. Other acoustic consequences ofgenioglossus contraction, of course, are changes inthe resonances of the vocal tract, which reflectmovement of the tongue. These changes, alongwith those in fo (and perhaps others as well)pinpoint a phonetic gesture that achieves a vowel-specific change in vocal-tract shape. If listenerscan use that configuration of acoustic reflections oftongue-movement (or, more likely, of coordinatedtongue and jaw movement) to recover the vocalicgesture, then they can pull effects on fo of thevocalic gesture from those for the intonationcontour that cooccur with them.

Listeners do just that. In sentence pairs such as'They only feast before fasting" and "They onlyfast before feasting," with intonational peaks onthe "fVst" syllables, listeners require a higherpeak on "feast" in the second sentence than on"fast" in the first sentence in order to hear thefirst peak of each sentence as higher than thesecond (Silverman, 1987). Compatibly, amongsteady-state vowels on the same fo, more openvowels sound higher in pitch than more closedvowels (Stoll, 1984). Intrinsic fo of vowels does notcontribute to perception of an intonation contouror to perception of pitch. But it is not thrown awayby perceivers either. Rather, along with spectralinformation for vowel height, it serves asinformation for vowel height (Reinholt Peterson,1986).

We will not review the literature on listeners'use of fo perturbations due to obstruent voicingexcept to say that it reveals the same picture ofthe perceiver as the literature on listeners' use ofinformation for vowel height (for a description ofthe fo perturbations: Ohde, 1984; for studies oflisteners' use of the perturbations: Abramson &Lisker, 1985; Fujimura, 1971; Haggard, Ambler, &Callow, 1970; Silverman, 1986; for evidence thatlisteners can detect the perturbations when theyare superimposed on intonation contours:Silverman, 1986). As the motor theory and thetheory of direct perception both claim, listeners'percepts do not correspond to superficial aspects ofthe acoustic signal. They correspond to gestures,signaled, we propose, by configurations of acousticreflections of those gestures.

Does duplex perception reveal a closedspeech module?

We return to the phenomenon of duplexperception and consider whether it doesconvincingly reveal distinct closed and open

modules for speech perception and generalauditory perception respectively. As noted earlier,duplex perception is obtained, typically, whenmost of the acoustic structure of a syntheticsyllable is presented to one ear, and theremainder usually a formant transitionispresented to the other ear (refer to Figure 2). Insuch instances, listeners hear two things. In theear that gets most of the signal, they hear acoherent syllable, the identity of which isdetermined by the transition presented to theother ear. At the same time, they hear a distinct,non-speech 'chirp' in the ear receiving thetransition. The percept is duplexthe transitionis heard as a critical part of a speech syllable,hypothetically as a result of it's being processed bythe speech module, and it is heard simultaneouslyas a non-speech chirp, hypothetically as a result ofits being processed also by an open auditorymodule (Liberman & Mattingly, 1985). Here weoffer a different interpretation of the findings.

Whalen and Liberman (1987) have recentlyshown that duplex perception can occur withmonaural or diotic presentation of the base andtransition of a syllable. In this case, duplexity isattained by increasing the intensity of the thirdformant transition relative to the base untillisteners hear bd.!. an integrated syllable (/da/ or/ga/ depending on the transition) and a non-speech'whistle' (sinusoids were used for transitions). Inthe experiment, subjects first were asked to labelthe isolated sinusoidal transitions as "da" or "ga".Although they were consistent in their labeling,reliably identifying ono whistle as "da" and theother as *ga," their overall accuracy was notgreater than chance. About half the subjects wereconsistently right and the remainder wereconsistently wrong. The whistles are distinct, butthey do not sound like "da" or "ga." Next, Whalenand Liberman determined 'duplexity thresholds'for listeners. They presented the base and one ofthe sinusoids simultaneously and gave listenerscontrol over the intensity of the sinusoid.Listeners adjusted its intensity to the point wherethey just heard a whistle. At threshold, subjectswere able to match these duplex sinusoids tosinusoids presented in isolation. Finally, subjectswere asked to identify the integrated speechsyllables as "da" or "ga" at sinusoid intensitiesboth 6 dB above and 4 dB below the duplexitythreshold. Subjects were consistently good atthese tasks yielding accuracy scores well above90%.

In the absence of any transition, listeners hearonly the base and identify it as "da" most of the

119

Page 121: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

112 Fowler and Rosenblum

time. When a sinusoidal transition is present butat intensities below the duplexity threshold,subjects hear only the unambiguous syllable ("da"or NW depending on the transition). Finally,when the intensity of the transition reaches andexceeds the duplexity threshold, subjects hear the"da" or "ga" and they hear a whistle at the sametime: i.e., the transition is duplexed.

This experiment reveals two new aspects of theduplex phenomenon. One is that getting a duplexpercept requires a sufficiently high intensity ofthe transition. A second is that the transitionintegrates with the syllable at intensities belowthe duplexity threshold. Based on this latterfinding, Whalen and Liberman conclude thatprocessing of the sinusoid as speech has priority.It is as if a (neurally-encoded) acoustic signalmust first pass through the speech module atwhich point portions of the signal that specifyspeech events are peeled off. After the speechmodule takes its part, any residual is passed on tothe auditory module where it is perceivedhomomorphically. Mattingly and Liberman (1988)refer to this priority of speech processing as"preemptiveness," and Whalen and Liberman(1987) suggest that it reflects the "profoundbiological significance of speech."

There is another way to look at these findings,however. They suggest that duplex perceptiondoes not, in fact, involve the same acousticfragment being perceived in two wayssimultaneously. Rather part of the transitionintegrates with the syllable and the remainder isheard as a whistle or chirp.8 As Whalen andLiberman themselves describe it:

". . . the phonetic mode takes precedence inprocessing the transitions, using them for its speciallinguistic purposes until, having appropriated itsshare, it passes the remainder to be perceived bythe nonspeech system as auditory whistles."(Whalen & Liberman, 1987, p. 171; our italics).

This is important, because in earlier reports ofduplex perception, it was tr..: apparent perceptionof the transition in two different ways at once thatwas considered strong evidence favoring twodistinct perceptual systems, one for speech andone for general auditory perception. In addition,research to date has only looked for preemp-tiveness using speech syllables. Accordingly, it ispremature tc conclude that speech especially ispreemptive. Possibly acoustic fragments integratepreferentially whenever the integrated signalspecifies some coherent sound-producing event.

We have recently looked for duplex perception inperception of nonspeech sounds (Fowler andRosenblum, in press). We predicted that it wouldbe possible to observe duplex perception andpreemptiveness whenever two conditions are met:1) A pair of acoustic fragments is presented that,integrated, specify a natural distal event; and 2)one of the fragments is unnaturally intense.Under these conditions, the integrated eventshould be preemptive and the intense fragmentshould be duplexed regardless of the type ofnatural sound-producing event that is involved,whether it is speech or non-speech, and whether itis profoundly biologically significant or biologicallytrivial.

There have been other attempts to get duplexperception for nonspeech sounds. All the ones ofwhich we are aware have used musical stimuli,however (e.g., Collins, 1985; Pastore, Schmuckler,Rosenblum, & Szczesuil, 1983). We chose not touse musical stimuli because it might be arguedthat there is a music module. (Music is universalamong human cultures, and there is evidence foran anatomical specialization of the bvain for musicperception (e.g., Shapiro, Grossman, & Gardner,1981). These considerations led us to choose anon-speech event that evolution could not haveantici; cited. We chose an event involving a recenthuman artifact: a slamming metal door.

To generate our stimuli, we recorded a heavymetal door (of a sound-attenuating booth) beingslammed shut. A spectrogram of this sound can beseen in Figure 3a. To produce our 'chirp,' we high-pass filtered the signal above 3000 Hz. To producea 'base,' we low-passed filtered the original signal,also at 3000 Hz (see bottom panels of Figure 3). Tous, the high-passed 'chirp' sounded like a can ofrice being shaken, while the low-pass-filtered basesounded like a wooden door being slammed shut.(That is, the clanging of the metal door waslargely absent.)

We asked sixteen listeners to identify theoriginal metal door, the base and the chirp. Themodal identifications of the metal door and thebase included mention of a door; however, lessthan half the subjects reported hearing a doorslam. Even so, essentially all of the identificationsinvolved hard collisions of some sort (e.g., bootsclomping on stairs, shovel banged on sidewalk). Incontrast, no subject identified the chirp as a doorsound, and no identifications described hardcollisions. Most identifications of the chirpreferred to an event involving shaking(tambourine, maracas, castinets, keys).

Page 122: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

The Perception of Phonetic Gestures 113

000

'2000C.)

Oa

-400:)

-6000

-MOO

-10000

SECONOS

4 6

Frequency in kHz

000

-2000

-4000

-6000

0 -8000

-10000

000

I" -20000

-4000

N -6000O

-1 -6000

Figure 3. Display e stimuli used to obtain duplex perception of closing-door sounds.

Given our metal-door chirp and a high-pass-filtered wooden door slam, subjects could notidentify which was a filtered metal door slam andwhich a filtered wooden door slam. Subjects wereconsistent in their labeling judgments, identifying one of the chirps as a metal door and theother as a wooden door. However, overall, more ofthem were consistently wrong than right. Onaverage, they identified the metal-door chirp asthe sound of a metal door 31% of the time and thewooden-door chirp as a metal door sound 79% ofthe time.9

To test for duplex perception and preemptive-ness, we first trained subjects to identify theunfiltered door sound as a "metal door," the baseas a "wooden door* and the upper frequencies ofthe door as a shaking sound. Next we tested themon stimuli created from the base and the chirp. Wecreated 15 different diotic.stimuli. All included thebase, and almost all included the metal-door chirp.The stimuli differed in the intensity of the chirp.The chirp was attenuated or amplified bymultiplying its digitized voltages by the followingvalues: 0, .05, .1, .15, .2, .9, .95, 1, 1.05, 1.1, 4, 4.5,5, 5.5 and 6. That is, there were 15 differentintensities falling into three ranges; five were wellbelow the natural intensity relationship of thechirp to the base, five were in the range of thenatural intensity relation, and five were well

4 6

Frequency in kHz

above it. Three tokens of each of these stimuliwere presented to subjects diotically in arandomized order. Listeners were told that theymight hear one of the stimuli, metal door, woodendoor or shaking sound, or sometimes two of themsimultaneously, on each trial. They were toindicate what they heard on each trial by writingan identifying letter or pair of letters on theiranswer sheets.

In our analyses, we have grouped responses tothe 15 stimuli into three blocks of five. In Figure4, we have labeled these Intensity Conditions low,medium, and high. Figure 4 presents the resultsas percentages of responses in the variousresponse categories across the three intensityconditions. We show only the three mostinteresting (and most frequent) responses. Thefigure shows that the most frequent response forthe low intensity condition is 'wooden door,' thelabel we asked subjects to use when they heardthe base. The most frequent response for themedium condition is 'metal door,' the label weasked subjects to use when they heard the metaldoor slam. The preferred response for the high-intensity block of stimuli is overwhelmingly 'metaldoor + chirp,' the response that indicates a duplexpercept. The changes in response frequency overthe three intensity conditions for each responsetype are highly significant.

121

Page 123: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

114 Fowler and Rosenblum

0 Wooden Door

113 'Metal Door21 'Metal Door + Chirp'

Low Middle

Intensity Condition

High

Figure 4. Percentage of responses falling in the three response categories, "wooden door," "metal door," and "metaldoor plus shaking sound" across three intensity ranges of the shaking sound.

Our results can be summarized as follows. First,at very low intensities of the upper frequencies ofthe door, subjects hear the base only. When the'chirp' is amplified to an intensity at or near itsnatural intensity relation to the base, subjectsreport hearing a metal door the majority of thetime. Further amplification of the 'chirp,' leads toreports of the metal door and a separate shakingsound. The percept is duplex, and the metal doorslam is preemptive.

There are several additional tests that we mustrun to determine whether our door slams, in fact,are perceived analogously to'speech syllables inprocedures revealing duplex perception. If we canshow that they are, then we will conclude that anaccount of our findings that invokes a closedmodule is inappropriate. Evolution is unlikely tohave anticipated metal door slams, and metal-door slams aren't profoundly biologicallysignificant. We suggest alternatively thatpreemptiveness occurs when a chirp fills a "hole*in a simultaneously presented acoustic signal sothat, together the two parts of the signal specifysome sound-producing distal event. If anything is

left over after the hole is filled, the remainder isheard as separate.

Summary and concluding remarksWe have raised three challenges to the motor

theory. We challenge their inference fromevidence that phonetic gestures are perceived thatspeech perception involves access to the talker'sown motor system. The basis for our challenge is aclaim that dimensions of percepts always conformto those of distal events even in cases whereaccess to an internal synthesizer for the events isunlikely. A second, related, challenge is to the ideathat only some percepts are heteromorphicjustthose foi which we have evolved closed modules.When Liberman and Mattingly write that speechperception is heteromorphic, they meanheteromorphic with respect to structure inproximal stimulation, but they always mean aswell that the percept is homomorphic with respectto dimensions of the distal source of the proximalstimulation. We argue that percepts are generallyheteromorphic with respect to structure inproximal stimulation, but, whether they are or

.122

Page 124: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

s.

The Perception of Phonetic Gestures 215

not, they are always homomorphic with respect todimensions of distal events. Finally, we challengethe interpretation of duplex eception thatascribes it to simultaneous processing of one pertof an acoustic signal by two modules. We suggest,instead, that duplex perception reflects thelistener's parsing of acoustic structure intodisjoint parts that specify, insofar as the acousticstructure permits, coherent distal events.

Where (in our view) does this leave the motortheory? It is fundamentally right in its claim thatlisteners perceive phonetic gestures, and also,possibly, in its claim that Numans have evolvedneural systems specialized for perception andproduction of phonetic gestures. It is wrong, webelieve, specifically in its claims about what thosespecialized systems do, and generally in the viewthat closed modules must be invoked to explainwhy distal events are perceived.

Obviously, we prefer our own, direct-realist,theory, not so much because it handles the databetter, but because, in our view, it fits better in auniversal theory of perception. But however ourtheory may be judged in relation to the motortheory, we recognize that we would not havedeveloped it at all in the absence of the importantdiscoveries of the motor theorists that gesturesare perceived.

REFERENCESAbranure, A. & Usher, L (1985). Relative power of cues: Fo shift

versus voice timing. In V. Fromkin (Ed.), Phonetic linguistics:Essays in bona of Peter Ladefoged (pp. 25-32). Orlando, FLAcademic Press.

Breckenridge, J. (1977). Declination as a phonological process. BellLaboratories Technological Memo. Murray Hill, NJ.

Bregman, A. (1987). The meaning of duplex perception. In M. E.FL Schouten (Ed.), The psydsophysics of speech perception (pp. 95-111). Dordrecht Martinus Niptoff.

Browman, C., & Goldstein, L. (1985). Dynamic modeling ofphonetic structure. In V. Fromitin (Ed.), Phonetic Linguistics:Essays in honor of Peter ladefoged (pp. 35-53). Orlando, FL.Academic Press.

Browman, C, & Goldstein, L (1986). Towards an articulatoryphonology. In C. Ewan & J. Anderson (Eds.), PhonologyYr-book, 3 (pp. 219-254). Cambridge: Cambridge UniversityPress.

Browman, C., & Goldstein, L (in preys a). Tiers in articulatoryphonology with some implications for canal speech. In J.Kingston & M. Beckman (Eds.), Papers in laboratory phonology.Between the grammar and the physics of speech. Cambridge:Cambridge University Press.

Browman, C., & Goldstein, L (in press b). Zestural structures andphonological patterns. In I. C. Mattingly & M. Studdert-Kennedy (Eds.), Modularity and the motor theory of speechperception. Hillsdale NJ: Lawrence Erlbaum Assodates.

Carney, P., & Moll, K. (1971). A cinefluorographic investigation offricative-consonant vowel coarticulation. Phonet:ce, 23,193.201.

Collins, S. (1985). Duplex perception with musical stimuli: Afurther investigation. Perception and Psychophysics, 38,172 -177.

Cooper, F., Delattre, P., Liberman, A., Borst, J., & Cerstman, L(1952). Some experiments on the perception of synthetic speechsounds. Journal of the Acoustical Sodety of America, 24, 597-606.

Daniloff, R., & Hammarberg, R. (1973). On definingcoarticulation. Journal of Phonetics, 1,239-248.

Fitch, H., Halwes, T., Erickson, D., & Liberman, A. (1980).Perceptual equivalence of two acoustic cues for stop consonantmanner. Perception and Psychophysics, 27, 343450.

Fodor, J. (1983). The modularity of mind. Cambridge, MA: MITPress.

Fonda', J., & Abbe, J. (1975). Lip and jaw motor control duringspeech: Responses to resistive loading of the jaw. Journal ofSpeech and Hearing Research,18,2(17-220.

Folidns, J., & Abbe, J. (1976). Additional observations onresponses to resistive loading of the jaw. Journal of Speeds andHearing Resserch, i9, 820.821.

Fowler, C (1981). Production and perception of coarticulationamong stressed and unstressed vowels. Journal of Speeds andHearing Research, 46, 127-139.

Fowler, C. (1984). Segmentation of coarticulated speech inperception. Perception and Psychophysics, 36,359-368.

Fowler, C (1986a). An event approach to the study of speechperception from a direct-realist perspective. Journal ofPhettliz,14,3.28.

Fowler, C (1986b). Reply - commentators. Journal of Phonetics,14,149-170.

Fowler, C., & Rosenblum, L D. (in press). Duplex perception: Acomparison of monosyllables and slamming of doors. Journal ofExperimental Psycholoz;- Human Perception and Performance.

Fowler, C, Sc Smith, M. 11986). Speech perception as "vectoranalysis': An approach to the problems of segmentation andinvariance. In J. Perkell & D. Klatt (Eds.), Invariance andvariability of speech processes (pp. 123-135). Hillsdale, NJ:Lawrence ErIbaum Assodates.

Fujimura, 0. (1971). Remarks on stop consonants: Synthesisexperiments and acoustic cues. In L. Hammerich, R. Jakobson,& E. Zwirner (Eds.), Form and substance (pp. 221-232).Copenhagen: Akademisk Feriae.

Geller, C, Harris, K., C -" her, R., & Baer, T. (1985). Is declinationactively controlled? in I. Titze (Ed.), Vocal-fold physiology:Physiological and biophysics of the voice. Iowa City: IowaUniversity Press.

Geller, C, Harris, K., & Baer, T. (1987). Controlled variables 0.2tsentence intonation. In T. Baer, C. Sasaki, & K. Harris (Eds.),Laryngeal function in phonation and respiration (pp. 422-432).Boston: College -Hill Press.

Gibson, J. J. (1966). The senses considered as perceptual systems.Boston: Houghton-Mifflin.

Gibson, J. J. (1979). The ecological approach to visual perception.Boston: Houghton-Mifflin.

Haggard, M., Ambler, S., & Callow, M. ( 1970). Pitch as a voidngcue. Journal of the Acoustical Society of America, 47, 613-617.

hockett, C. (1955). Manual of phonology. (Publications inanthropology and linguistics, No. 11). Bloomington: IndianaUniversity Press.3nda, K. (981). Relationship between pitch control and voweleradiation. In D. Bless & J. Abbe (Eds.), vocal-fold physiology(pp. 186-297). San Diego: College-Hill P

Jakobson, R., Fent, C C. M., & Halle, M. (1951). Preliminaries tospeech analysis: The distinctive features and their correlates.Cambridge, Mk. MIT Press.

Johanson, C. (1973). Visual perception of biological motion.Perception and Psychophysics, 24, 201-211.

Kelso, J. A. S., Tuner, B., Vatikiotis-Bateson, E., & Fowler, C.(1984). Functiotally-specific articulatory cooperation followingjaw perturbations during speech: Evidence for coordinative

123

Page 125: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

116 Fowler and Rosenblum

etrudures. Journal of Experimental PsydsoloRy: Human Perceptionand Palm ...ince 10, 812-832.

Kelso, J. A. S., Saltzman, E., & Tulles, B. (1986). The dynamicalperspective on speech production: Data and theory. Journal ofPhonetics,14,29-59.

Kimura, D. (1961). Cerebral dot:Mamiee and the perception ofverbal stimuli. Canadian Journal of Psychology, 15 166-171.

Lehiste, I. (1982). Some phonetic characteristics of discourse.Studio Linguistic., 36,117 -130.

Lehiste, I., & Peterson, G. (1961). Some basic considerations in theanalysis of intonation. Journal of the Acoustical Society of America,33,419-425.

Liberman, A. (1974). The specialization of the languagehemisphere (pp. 4356). In F. O. Schmitt & F. G. Worden (Eds.),The neurosciences: Third study program. Cambridge, MA: MITPress.

Liberman, A. (1982). On finding that speech is special. AmericanPsychologist, 37,148.167.

Liberman, A., Cooper, F., Harris, K., & MecNeilage, P. (1963). Amotor theory of speech perception. Proceedings of the SpeechCommunication Seminar, Stockholm: Royal Institute ofTechnology, D3.

Liberman, A., Cooper, F., Shankweller, D., & Studdert-Kennedy,M. (1967). Perception of the speech code. Psychological Review,74, 431-461.

Liberman, A., Isenberg, D., & Rekord, B. (1981). Duplexperception of cues for stop consonants: Evidence for a phoneticmodule. Perception and Psydtophysics, 30,133-143.

Liberman, A., & Mattingly, L (1985). The motor theory of speechperception revised. Cognition, 21,1-36.

Ltifqvist, A., Baer, T., McGarr, N. S., & Seider Story, R. (in press).The alcothyrold muscle in voicing control. Journal of theAcoustical Society of America.

MacDonald, M., & McCurk, H. (1978). Visual influences onspeech perception. Perception and Psychophysics,24, 253-257.

Maeda, S. (1976). A characterisation of American English intonation.Unpublished doctoral dissertation, Massachusetts In oitute ofTechnology .

Mann, V. (1980). Influence of preceding liquid on stop consonantperception. Perception and Psychopkysics, 28, 407-412.

Mann, V. & Liberman, A. (1983). Some differences betweenphonetic and auditory modes of perception. Cognition, 14, 211-235.

Massaro, D. (1987). Speech perception by ear and eye. A paradigm forpsychological inquiry. Hillsdale, NJ: Lawrence ErlbaumAssociates.

Mattingly, I. C., & Liberman, A. M. (1988). Specialized perceivingsystems for speech and other biologically-significant sounds. InG. Edelman, W. Gall, & W. Cowan (Eds.), Auditory function: Theneumbiological bases of hearing (pp. 775-793). New York Wiley.

Mattingly, L G, & Liberman, A. M. (in press). Speech and otherauditory modules. In G. Edelman, W. Call, & W. Cowan (Eds.),Signal and sense: Zocal and globes order in perceptual maps. NewYork: Wiley.

Milliken, R. (1984). Language and other biological categories.Cambridge, MA. MIT Press.

°hale, J. (1978). Production of tone. In V. Fromkin (Ed.), Tone: Alinguistic survey (pp. 5-39). New York Academic Press.

Ohala, J. (1981). The listener as a source of sound change. In C.Masek, R. Hendrick, & M. Miller (Eds.), Papers from theparasession on language and behavior (pp. 178-203). Chicago:Chicago Linguistics Society.

Ohde, R. (1984). Fundamental frequency as an acoustic correlateof stop consonant voicing. Journal of the Acoustical Society ofAmerica, 75,224-240.

Ohman, S. (1966) Coarticulation in VCV utterances:Spectrographic measures. Journal of the Acoustical Society ofAmerica, 39,151 -168.

Pastore, R., Schmuckler, M., Rosenblum, L D., & Szczesiul, R.(1983). Duplex perception with musical stimuli. Perception andPsychophysia, 33, 323-332.

Pierrehunibert, J. (1979). The perception of fundamentalfrequency. Journal of the Acoustical Society of America, 66, 363-369.

Porter, R. (1978). Rapid shadowing of 3/, :sables: Evidence forsymmetry of speech perceptual and -rotor systems. Paperpresented at the meeting of the Psyd.anoadcs Society, SanAntonio.

Porter, R., & Lubker, J. (1980). Rapid reproduction of vowel-vowelsequences: Evidence for a fast and direct acoustic-motoriclinkage in speech. Journal of Speeds aid Hearing Research, 23, 593-602

Rand, T. (1974). Dichotic release from masking for speech. Journalof the Acoustical Society of America, 55, 678-680.

Reed, E. & Jones, R. (1982). Reasons for realism: Selected essays ofJames I. Gawps. Hillsdale, NJ: Lawrence Erlbaum Associates.

Reinholt Peterson, N. (1986). Perceptual compensation for seg-mentally-conditioned fundamental-frequency perturbations,Phonetics, 43,31-42.

Remez, R., Rubin, P., Pisoni, D., & Carrell, T. (1981). Speechperception without traditional speech cues. Science, 212, 947-950.

Repp, B. (1987). The sound of two hands dapping: An exploratorystudy. Journal of the Acoustical Society of America, 81,1100-1109.

Repp, B., Milburn, C. & Ashkenas, J. (1983). Duplex perception:Confirmation of fusion. Perception and Psychophysics, 33, 333-338.

Rosenblum, L D. (1987). Towards an ecological alternative to themotor theory of speech perception. PAW Review (TechnicalReport of Center for the Ecological Study of Perception andAction, University of Connecticut), 2, 25-29.

Saltzman, E. (1986). Task dynamic coordination of the speecharticulators. In H. Heuer & C. Fromm (Eds.), Generation andmodulation of action patterns (pp. 129-144). (Experimental BrainResearch Series 15). New York Springer-Verlag.

Saltzman, E., & Kelso, J. A. S. (1987). Skilled actions: A task-c' gamic approach. Psychological Review, 94, 84-106.

Schiff, W. (1965). Perception of impending collision. PsychologicalMonographs. 79, No. 604.

Schiff, W., Caviness, J., lc Gibson, J. (1962). Persistent fearresponses In rhesus monkeys to the optical stimulus of"looming." Scienta,136, 982-983.

Shapiro, B., Grossman, M., & Gardner, H. (1981). Selectiveprocessing deficits in brain-cismaged populations.Neuropsycholgia, 19, 161-169.

Silverman, K. (1986). Fo segmental cues depend on intonation:The case of the rise after voiced stops. Phonetic, 43, 76-92.

Silverman, K. (1987). The structure and processing of fundamentalfrequency contours. Unpublished doctoral dissertation,Cambridge University.

sevens, K. (1960). Toward a model for speech recognition. Journalof the Acoustical Society of America, 32, 47-55.

Stevens, K., & Blumstein, S.(1981). The search for invariantcorrelates of acoustic features. In P. Eimas & J. Miller (Eds.),Perspectives on the study of speech (pp. 1-38). Hillsdale, NJ:Lawrence Erlbaum Associates.

Stevens, K., & Hale, M. (1967). Remarks on analysis by synthesisand distinctive features. In W. Wathen-Dunn (Ed.), Models forthe perception of speech and visual form pp. 88-102). Cambridge,MA: MIT Press.

124

Page 126: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

The Perception of Phonetic Gestures 117

Stoll, G. (1984). Pitch of vowels: Experimental and theoreticalinvestigation of its dependence on vowel quality. SpeechCommunication, 3,137 -150.

Studded-Kennedy, M. (1986). Two cheers for direct realism.Jamul of Phrnieks,14, 99-104.

Studded- Kennedy, M., & Shankweiler, D. (1970). Hemisphericspecialization for speech perception. journal of the AcousticalSociety of America, 48, 579-594.

VanDerVeer, N. (1979). Ecological acoustics: Human perception ofenvironmental sounds. Unpublished doctoral dissertation,Cornell University.

Warren, W. & Verbrugge, R. (1984). Auditory perception ofbreaking and bouncing events: A case study in ecologicalacoustics. Journal of Experimental Psychology: Human Perceptionand Performance, 10, 704-712.

Whalen, D. (1984). Subaoegorical mismatches slow phoneticjudgments. Perception and Psychophysics, 35, 49-64,

Whalen, D. & Uberman, A. (1987). Speech perception takesprecedence over nonspeech perception. Science, 237,169 -171.

FOOTNOTES"In I. G Mattingly & M. Studded- Kennedy (Eds.), Modularityand the motor theory of speech perception.:Misdate, NJ: LawrenceErlbaum Associates, in press.

tAbo Dartmouth College, Hanover, New Hampshire.tt Also University of Connecticut, Storrs. Now at the University

of California at Riverside, Department of Psychology.inters is a small q libation to the claim that listeners cannottell what contributions visible and audible information eachhave to their perceptual experience in the McGtuk effect.Massaro (1987) has shown that effects of the video display canbe reduced but not eliminated by instructing subjects to look at,but to Ignore, the display.

2Uberman et al. identify a cipher as a system in which eachunique unit of the message maps onto a unique symbol. Incontrast, in a code, the correspondence between message unitand symbol is not 1:1.

3Liberman et al. propose to replace the more conventional viewof the features of a phoneme (for example, that of jalcobson,Fant & Halle, 1951) with one of features as "implicitinsti uctions to separate and independent parts of the motormachinery" (p. 446).

4With one apparent slip on page 2: "The objects of speechperception are the int sided phonetic gestures of the speaker,represented in the brain as invariant motor commands...."

5One can certainly challenge the idea that listeners recover thevery gestures that occurred to produce a speech signal.Obviously there are no gestures at all responsible for mostsynthetic speech or for "sine-wave speech" (e.g., Remez, Rubin,

Plsonl, & CerrelL 1981) and quite different behaviors underlie aparrot's or mynah bird's mimicking of speech. The claim thatwe argue is incontrovertible is that listeners recover gesturesfrom speech-like signals, even those generated in some otherway. (We direct reels:: !Fowler, 19862,14 would also argue that"misperceptions" (hearing phonetic gestures svhere there arenone) can only occur in limited varieties of waysthe mostnotable being signals produced by certain mirage-produdnghuman artifacts, such as speech synthesizers or mirage-producing birds. Another, however, possibly, includes signalsproduced to mimic those of normal speakers by speakers withpathologies of the vocal tract that prevent normal realization ofgestures.)

6There are two almost orthogonal perspectives from whichperception can be studied. On the one hand, investigators canfocus on processes inside the perceiver that take place from thetime that a sense organ is stimulated until a percept is achievedor a response is made to the input. On the other hand, they canlook outside the perceiver and ask what, in the environment,the organism under study perceives, what information instimulation to the sense organs allows perception of the thingspekeived, and finally, whether the organisms in fact we thepostulated information. Here we focus on this latterperspective, most closely associated with the work of JamesGibson (e.g., 1966; 1979; Reed & Jones, 1982).

7It is easy to find examples in which perception is heteromorphicwith respect to the proximal stimulation and homomorphicwith respect to distal eventslooming, for example. We analso think of some examples in which perception appearshomomorphic with respect to proximal stimulation, but in theexamples we have come up with, they are homomorphic withrespect to the distal event as well (perception of a line drawnby a pencil, for example), and so there is no way to decidewhether perception is of the proximal stimulation or of thedistal event. We challenge the motor theorists to provide anexample in which perception is homomorphic with structure inproximal stimulation that is not also homomorphic with distalevent structure. These would provide convincing cases ofproximal stimulation perception.

8Bregman (1987) considers duplex perception to dsconfirm his"rule of disjoint allocation" in acoustic scene analysis bylisteners. According to the rule, each acousti.: fragment isassigned in perception to one and only one erviroturientalsource. It seems, however, that duplex perception does notdisconfirns the rule.

9Using a more sensitive, AXB, test, however, we have found thatlisteners can match the metal door chirp, rather than a woodendoor chirp, to the metal door slam at performance levelsconsiderably better than chance.

Page 127: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Haskins Laboratories Status Report on Speech Research1989, SR- 99/100,118 -138

Competence and Performance in Child Language*

Stephen Craint and Janet Dean Fodortt

1. INTRODUCTIONThis paper presents the results of our recent

experimental inve-tigations of a central issue inlinguistic theory: which properties of humanlanguage are innately determined? There are twomain sources of information to be tapped to findthe answer to this question. First, universalproperties of human languages are plausibly (evenif not necessarily) taken to be innatelydetermined. In addition, properties that emerge inchildren's language in the absence of decisiveevidence in their linguistic input are reasonablyheld to be innate. Clearly, it would be mostsatisfactory if these two diagnostics for what isinnate agreed with each other. In some cases theydo. For example, there is a universal principlefavoring transformational movement of phrasesrather than of lexical categories e.g., topicalizationof noun phrases but not of nouns. To the best ofour knowledge children abide by this principle;they nay bear sentences such as Candy, you can'thave now, but they don't infer that nouns can betopicalized. If they did, they would say things likeVegetables, I won't eat the. But this is not anerror characteristic of children. Instead, from themoment they produce topicalized constructions atall, they apparently produce correct NP-topicalized forms such as The vegetables, I won'teat.

In recent years, this happy convergence ofresults from research on universals and research

This research was supported in part by NSF GrantBNS 84-18537, and by a Program Project Grant to HaskinsLaboratories from the National Institute of Child Health andHuman Development (HD-01994). The studies reported in thispaper were conducted in collaboration with several friends andcolleagues: Henry Hamburger, Paul Gorrell, Howard Lasnik,Cecile McKee, Keiko Murasugi, Mineharu Nakayama, JayaSarma and Rosalind Thornton. We thank them for theirpermission to gather this work together here.

on acquisition has been challenged byexperimental studies reporting various syntacticfailures on the part of children. The children inthese experiments are apparently violatingputatively universal phrase structure principles orconstraints on transformations. Failure todemonstrate early knowledge of syntacticprinciples is reported by Jakubowicz (1984), Lust(1981), Matthei (1981, 1982), Phinney (1981),Roeper (1986), Solan and Roeper (1978),Tavakolian (1978, 1981) and Wexler and Chien(1985). Some explanation is clearly called for if asyntactic principle is respected in all adultlanguages but is not respected in the language ofchildren.

Assuming that the experimental data accuatelyreflect children's linguistic competence, there areseveral possible responses to the unaccom-modating data. The most extreme would be to giveup the innateness claim for the principle inquestion. One might look for further linguisticdata which show that it isn't universal. Or onemight abandon the hypothesis that all universalprinciples are innate. For instance, Matthei (1981)obtained results that he interpreted as evidencethat universal constraints on children'sinterpretation of reciprocals are learned, notinnate. However, this approach is plausible only ifone can offer some other explanation (e.g.,functional explanation) for why the constraintsshould be universal. But this is not always easy;as Chomsky (1986) has emphasized, manyproperties of natural language are arbitrary andhave no practical motivation.

A different response to the apparent failure ofchildren to respect constraints believed to beinnate is to argue that the constraints are as yetinapplicable to their sentences. The claim is thatas soon as a child's linguistic analyses havereached the level of sophistication at which a

128 .1 2

Page 128: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Competence and Performance in Child Language 119

universal constraint becomes relevant, then thatconstraint will be respected. For example, Otsu(1981) has argued that children who give theappearance of violating a universal constraint onextraction may not yet have mastered thestructure to which the constraint applies; theymay have only some simpler approximation to theconstruction, lacking the crucial property thatengages the constraint. We will discuss theevidence for this below.

A different approach also accepts therecalcitrant data as valid, but rejects the inferencethat the data are inconsistent with the innatenesshypothesis. It is pointed out that it is possible fora linguistic principle to be innately encoded in thehuman brain and yet not accessible to thelanguage faculty of children at early stages oflanguage acquisition. The principle in questionmight be biologically timed to become effective ata certain maturational stage. Like aspects of bodydevelopment (e.g., the secondary sexcharacteristics), linguistic principles might liedormant for many years. One recent proposalinvoking linguistic maturation, by Borer andWexler (1987), contends that a syntactic principleunderlying verbal passives undergoesmaturational development. They maintain thatbefore a critical stage of maturation is reached,children are unable to produce or comprehendpassive sentences (full verbal passives with by-phrases).

It may eventually turn out that the innatenesshypothesis must be augmented by maturationassumptions in certain cases. But suchassumptions introduce new degrees of freedominto the theory, so its empirical claims areweakened. Unless some motivated predictions canbe made about exactly when latent knowledgewill become effective, a maturational approach iscompatible with a much wider range of data thanthe simplest and strongest version of theinnateness hypothesis, viz. that children haveaccess to the same set of universal principles atall stages of language development. This morerestricted position is the one to be adopted until orunless there is clear evidence to the contrary, e.g.,clear evidence of a period or a stage at which allchildren violate a certain constraint, in allconstructions to which it is applicable, simple aswell as complex, and in all languages. So far, nosuch case has been demonstrated.1

Our research has taken a different approach.We argue that the experimental data do notunequivocally demonstrate a lack of linguistic

knowledge. We do not deny that children dosometimes misinterpret sentences. But the properinterpretation of such failures is complicated bythe existence of a variety of potentiallyconfounding factors. Normal sentencecomprehension involves lexical, syntactic,semantic, pragmatic and inferential abilities, andthe failure of any one of these may be responsiblefor poor performance on an experimental task. Itis crucial, therefore, to develop empirical methodswhich will distinguish between these variousfactors, so that we can determine exactly where achild's deficiencies Pe. Until this has been done,one cannot infer from children's imperfectperformance that they are ignorant of thegrammar of their target language.

In fact it can be argued in many cases that it isnon-syntactic demands of the task which are thecause of child en's errors. We, propose that taskperformance is weak at first and improves withage in large part because of maturation of non-linguistic capacities such as short term memory orcomputational ability, which are essential in theefficient practical application of linguisticknowledge. This does not deny that many aspectsof language must be learned and that there is atime when a child has not yet learned them. Butour interpretation of the data does make itplausible that young children know more of theadult grammar than has previously beendemonstrated, and also, most significantly, thattheir early grammars do abide by universalprinciples.

In support of this non-linguistic maturationhypothesis, we have reexamined andsupplemented a number of earlier experimentalfindings with demonstrations that nonsyntacticfactors were responsible for many of the children'serrors. The errors disappear or are greatlyreduced when these confounding factors aresuitably controlled for. In this paper, we report ona series of experimental studies along these linesconcerned with three kinds of nonsyntactic factorsin language performance: parsing, plans, andpresuppositions. We will argue that these otherfactors are crucially involved in the experimentaltasks by which children demonstrate theirknowledge, and that they impose significantdemands on children. If we underestimate thedemands of any of these other components of thetotal task we thereby underestimate the extent ofthe child's knowledge of syntax. As a result, thecurrent estimate of what children know aboutlanguage is misleading.

Page 129: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

120 Crain and Fodor

2. CHILDREN'S ERRORS INCOMPREHENSION

In this section we attempt to identify and isolateseveral components of language-related skills, inorder to gain a better understanding of each, andto clarify the relationship between the innatenesshypothesis and early linguistic knowledge. Verylittle work has been done on this topic. Themajority of language development studies seem totake it for granted that the experimentalparadigms provide a direct tap into the child'slinguistic competence. An important exception is astudy by Good luck and Tavakolian (1982), inwhich improved performance on a relative clausecomprehension task resulted from simplificationof other aspects of the syntax and semantics of thestimulus sentences (i.e., the use of intransitiverather than transitive relative clauses, andrelative clauses with one animate and oneinanimate noun phrase rather than twoanimates). The success of these manipulations isexactly in accord with our general hypothesisabout the relation between competence andperformance. As other demands on the child'sperformance are reduced, greater competence isrevealed.

Our experiments focus on three factors involvedin many child language experiments which mayinterfere with estimation of the extent ofchildren's linguistic knowledge in tasks which aredesigned to measure sentence comprehension.These factorsparsing, presupposition andplansare of interest in their own right, but havereceived very little attention in previous researchon syntax acquisition. In this section we willreview our recent work on these topics. In thefollowing section we will turn to an alternativeresearch strategy for assessing children'sknowledge, the technique of elicited production.

Parsing. Sentence parsing is a complex taskwhich is known to be governed (in adults) byvarious decision strategies that favor onestructural analysis over another where both arecompatible with the input word sequence. Evenadults make parsing errors, and it wouk aardlybe surprising, given the limited memory andattention spans of children, to discover that theydo too. These parsing preferences must somehowbe neutralized or factored out of an experimentaltask whose objective is the assessment ofchildren's knowledge of syntactic rules andconstraints.

Plans. The formation of an action plan is animportant aspect of any comprehension task

involving the manipulation of toys or otherobjects. If the plan for manipulating objectsappropriately in response to a test sentence isnecessarily complex to formulate or to execute, itsdifficulty for a child subject may mask his correctcomprehension of the sentence. Thus we need todevelop a better understanding of the nature andrelative complexity of such plans, and also todevil,- experimental paradigms in which theirimpact on performance is minimized.

Presuppositions. A variety of pragmaticconsiderations must also be taken into account,such as the contextual fixing of deictic reference,obedience to cooperative principles ofconversation, and so forth. In particular, ourresearch suggests that test sentences whosepragmatic presuppositions are unsatisfied in theexperimental situation are also unlikely to provideresults allowing an accurate assessment of achild's knowledge of syntactic principles. It isnecessary to establish which kinds ofpresuppositions children are sensitive to, and toensure that these are satisfied in experimentaltasks.

2.1. Parsing2.1.1. Subjacency. One universal constraint

which should be innate is Subjacency. Subjacencyprohibits extraction of constituents from variousconstructions, including relative clauses.However, in an experimental study by Otsu(1981), many children responded as if theyallowed extraction from relative clauses inanswering questions about the content of pictures.For example, children saw a picture of a girl usinga crayon to draw a monkey who was drinking milkthrough a straw. They were then asked to respondto question (1).

(1) What is Mary drawing a picture of amonkey that is drinking milk with?

Otsu found that many children responded to (1) ina way that appeared to violate Subjacency. In thiscase, the answer that is in apparent violation ofSubjacency is "a straw." This is because "a straw"is appropriate only if the what has been movedfrom a position in the monkey drinking milkclause as shown in (2a), rather than from theMary drawing picture clause as shown in (2b).

(2) a) *What is Mary drawing a picture of amonkey [that is drinking milk with l?

b) What is Mary drawing a picture of amonkey [that is drinking milk] with _?

Page 130: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Competence and Performance in Child Language 121

But the monkey drinking milk clause is a relativeclause, and Subjacency prohibits the what frommoving out of it. Thus the only acceptablestructure is (2b), and the only acceptable answeris "a crayon." If these data are interpreted solelyin terms of children's grammatical knowledge,then the conclusion would then have to be thatknowledge of Subjacency sets in quite late in atleast some children.

As we noted earlier, Otsu suggested that theinnateness of Subjacency could be salvaged byshowing that the children who appeared to violateSubjacency had not yet mastered the phrasestructure of relative clauses (of sufficientcomplexity to contain an extractable noun phrase).When he conducted an independent test ofknowledge of relative clause structure, he found,as predicted, a correlation between phrasestructure and Subjacency application in thechildren's performance. However, the children'sperformance was still surprisingly poor: 25% ofthe children who were deemed to have masteredrelative clauses gave responses involvingungrammatical Subjacency violating extractionsfrom relative clauses.

We have argued (Crain & Fodor, 1984) for analternative analysis of Otsu's data, which makes itpossible to credit children with knowledge of bothphrase structure principles and constraints ontransformations from an early age. We claim thatchildren's parsing routines can influence theirperformance on the kind of sentences used in theSubjacency test; in particular, that there arestrong parsing pressures encouraging subjects tocompute the ungrammatical analysis of suchsentences. Until a child develops sufficientcapacity to override these parsing pressures, theymay mask his syntactic competence, making himlook as if he were ignorant of Subjacency.

A powerful general tendency in sentence parsingby adults is to attach an incoming phrase low inthe phrase marker if possible. This has beencalled Right Association; see Kimball '1973),Frazier and Fodor (1978). In sentence , forexample, the preferred analysis has with NPmodifying drinking milk rather than modifyingdrawing a picture, even though in this case bothanalyses are grammatically well-formed becausethere has been no WH-movement.

(3) Mary is drawing a picture of a monkey thatis drinking milk with NP.

To see how strong this parsing pressure is, notehow difficult it is to get the sensible interpretationof (3) when a crayon is substituted for NP This

Right Association preference is still present if theNP in (3) is extracted, as in (1). The word with in(1) still coheres strongly with the relative clause,rather than with the main clause. The result isthat the analysis of (2) that is most immediatelyapparent is the ungrammatical (2a) in which whathas been extracted from the relative clause. Sincethis 'garden path' analysis is apparent to mostadults, it is hardly surprising if some of Otsu'schild subjects were also tempted by it andresponded to (1) in the picture verification task bysaying "a straw" rather than "a crayon."

We conducted several experiments designed toestablish the plausibility of this claim that therelatively poor performance of children onsentences like (1) is due to parsing pressuresrather than to ignorance of universal constraints.In the first experiment, we tested children andadults on complement-clause questions as in (4).Subjacency does tot prohibit extraction fromcomplement clauses, so if there were no RightAssociation effects this sentence should be fullyambiguous, with both interpretations equallyavailable.

(4) What is Bozo watching the dog jumpthrough?

That is, given a picture in which Bozo the clownis looking through a keyhole at a dog jumpingthrough a hoop, it would be correct to say eitherthe "the keyhole" or "a hoop." Intuitively, though,the interpretations are highly skewed for adults,with a strong preference for the Right Associationinterpretation ("hoop") in which the prepositionattaches within the lower clause. Our experimentshowed that the same is true for children. Wetested twenty 3- to 5-year-olds (mean age 4;6) onthese sentences using a picture verification taskjust like Otsu's, and 90% of their responses werein accord with the Right Associationinterpretation.2

Thus children and adults alike are stronglyswayed by Right Association. This is an importantresult. To the best of our knowledge the questionof whether children's parsing strategies resemblethose of adults has not previously beeninvestigated. But children certainly should showthe same preferences as adults, if the humansentence parsing mechanism .44 innatelystructured. And the parsing mechanism certainlyshould be innately structured, because it would bepointless to be born knowing a lot of facts aboutlanguage if one weren't also born knowing how touse those facts for speaking and understanding. Itis satisfying, then, to have shown that children

129

Page 131: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

122 Crain and Fodor

exhibit Right Association. And the fact that theydo offers a plausible explanation for why so manyof them failed Otsu'i Subjacency testthey werelistening to their pt rsers rather than to theirgrammars.

Our other experiments in support of thisconclusion were designed to show that even peoplewhose knowledge of Subjacency is not in doubti.e., adultsare also tempted to violateSubjacency when it is in competition with RightAssociation. We ran Otsu's Subjacency test onadults just as he did with the children. The adultsgave Subjacency-violating low attachmentresponses to 21% of these questions. This was notquite as high a rate as f)r the children, but as wehave noted, adults surely have a greater capacitythan Ihildren do for checking an illicit analysisand shifting to a less preferred but well-formedanalysis before they commit themselves to aresponse. In an attempt to equalize adult self.monitoring capacities with those of children, were-ran the Subjacency experiment with anadditional distracting task (= listening for adesignated phoneme in the stimulus sentence).Under these conditions the adults gaveSubjacency violating responses to 29% of therelative clause constructions, a slightly higherrate than the 25% for Otsu's child subjects.

Escalating still further, we changed thesentences so that the grammatically well-formedanalysis was semantically or pragmaticallyanomalous, as in (5).

(5) What color hat is Barbara drawing a pictureof an artist with?

Under these circumstances, where the semanticsclearly favored the Subjacency- violating analysis,75% of adults' responses violated Subjacency. Thismakes it very clear that linguistic competencemay not always be revealed by linguisticperformance.

Finally, we ran another study, in which weasked adults to classify sentences as ambiguous orunambiguous. The sentences were spoken in turnwith only a few seconds between them, and therewere 72 of them, so the task was fairlydemanding. The materials included complementquestions like (4) and relative clause questionslike (1), as well as a. 'iguous and unambiguouscontrol sentences of h....ny varieties. The resultsshowed a 62% ambiguity detection rate for theambiguous control sentences, with a 16% `falsealarm' rate for the unambiguous controlsentences. Thus the subjects were able to copewith the task tolerably well, though not perfectly.

What was interesting was that the ambiguity ofthe complement questions was detected only 48%of the time, in line with our claim that RightAssociation obscures the alternative reading withthe prepositional phrase in the main clause. Andmost interesting of all was that 80% of the relativeclause questions were judged to be ambiguous,even though Subjacency prohibits one analysisand renders them unambiguous. Our explanationfor this extraordinary result is that the subjectsfirst computed the Right Association analysisfavored by their parsing routines, then recognizedthat this was unacceptable because of Subjacency,and so rejected it in favor of the analysis with theprepositional phrase in the main clause. Weassume that it was this rapid shift from oneanalysis to the other that gave our subjects such astrong impression that these sentences wereambiguous. Note that if this misanalysis-with-revision occurs 80% of the time for adults, only aslight handicap in children's ability to revisewould be sufficient to account for their errors.

To sum up: we still have no pusitive 1r-of thatSubjacency is innate, but at least now there is noevidence against it. Our experiments make itplausible that children as young as can be testedare like adults both with respect to theirknowledge of this universal constraint and withrespect to their parsing routiresthey are justnot very good yet at coping with conflicts betweenthe two.

2.1.2. Backward pronominalization. Afundamental constraint on natural language is thestructure dependence of linguistic rules. Theinnateness hypothesis implies that children'searliest grammars should also exhibit structuredependenceeven if their linguistic experiencehappens to be equally compatible with structure-independent hypotheses. However, it has beenproposed that children initially hypothesize astructure-independent constraint oa anaphora,prohibiting all cases of backwardpronominalization (Solan, 1983).3 Backwardpronominalization consists of coreference betweena noun phrase and a preceding pronoun, asindicated by the indices in (6).

(6) That he; kissed the lion made the duck;happy.

We will argue that children do in fact permitbackward pronominalization, subject to structure-dependent constraints. We contend that theappearance of a general restriction againstbackward pronominalization is due to a parsingpreference for the alternative 'extra-sentential'

130

Page 132: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Competence and Performance in Child Language 123

reading of the pronoun in certain comprehensiontasks. The results of a new comprehensionmethodology show that children as young as 2;10admit the same range of interpretations forpronouns as adults do.

Two sources of evidence have been cited asevidence that children up to 5 or 6 years uniformlyreject backward pronominalization. First, childrenwho are asked to repeat back a sentence such as(7) often respond by converting it into a forwardpronominalization construction, as in (8) (Lust,1981).

(7) Because she was tired, Mommy was sleeping.

(8) Because Monm.y was tired, she was sleeping.

The fact that these children took the trouble toexchange the pronoun and its antecedent certainlyindicates that they disfavor backward pronom-inalization in their own productions. But it doesnot show that the backward pronominalizationinterpretation is not compatible with the child'sgrammar, as suggested by So lan (1983). To thecontrary, the conversion of (7) to (8) shows thatchildren do accept backward pronominalization incomprehension; for they would think of (8) as anacceptable variant of (7) only if they wereinterpreting the pronoun in (7) as coreferentialwith the subsearter t lexical noun phrase (Lasnik& Crain, 1985).

Second, it has been found that when the acting-out situation for a sentence like (6) includes apotential referent for he other than the duck (e.g.,a farmer), this unmentioned object is usuallyfavored by the children as the referent of thepronoun (Solan, 1983; Tavakolian, 1978). Incontrast to the prevailing view, wa would attributethis to a parsing preference for the extra-sentential interpretation of the pronoun; it doesnot have to be taken as evidence that childrenhave a grammatical prohibition against backwardanaphora. Our suggestion, then, is that children'sknowledge might be comparable to that ofadults, even if their performance differs.

It is particularly important to keep thisdistinction in mind for potentially ambiguousconstructions such as these. When a sentence hasmore than one possible interpretation, theinterpretation that children select can tell uswhich interpretation they prefer; it cannot showthat others are unavailable to them. After all,adults also exhibit biases in connection withambiguous constructions, but this does not lead usto accuse them of ignorance of alternativeinterpretations. To establish how much children

actually do know, we should look for the factorsthat might be biassing their interpretations, andalso for ways of minimizing this bias so thatinterpretations which are less preferred butnevertheless acceptable to them have a chance ofshowing through.

The most likely general source of bias againstbackward pronominalization is the fact thatinterpretation of the pronoun would have to bedelayed until the antecedent is encountered laterin the sentence. This retention of uninterpreteditems may strain a child's limited workingmemory. There is some evidence for thisspeculation. Hamburger and Crain (1984) havenoted that children show a tendency to interpretadjectives immediately, without waiting for theremainder of the noun phrase, even in caseswhere this leads them to give incorrect responses.And Clark (1971) has observed errors attributableto children's tendency to act out a clauseimmediately without waiting for other clauses inthe sentence. The only way to interpret thepronoun immediately in a sentence like (6) is toassign it an extra-sentential referent, as childrentypically do.

If this proposal is correct, it should be thatchildren will accept backward pronominalizationin an experimental task that presses subjects toaccess every interpretation they can assign to asentence. Crain and McKee (1985) used atrue/false paradigm in which subjects judge thetruth value of sentences against situations actedout by the experimenter. The sentences were as in(9), where either a coreferential reading or anextra-sentential reading of the pronoun ispossible.

(9) When he went into the barn, the fox stolethe food.

On each trial, a child heard a sentence followinga staged event acted out by one of twoexperimenters, using toy figures and props. Thesecond experimenter manipulated a puppet,Kermit the Frog. Following each event, Kermitsaid what he thought had happened on that trial.The child's task was to indicate whether or not thesentence uttered by Kermit accurately describedwhat had happened. Children were asked to feedKermit a cookie if he said the right thing, that is,if what he said was what really happened. In thisway, 'true' responses were encouraged in theexperimental situation. But sometimes Kermitwould say the wrong thing, if he wasn't payingclose attention. When this happened, the childwas asked to make Kermit eat a rag. (In pilot

Page 133: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

124 Crain and Fodor

work without the rag ploy, we had found thatchildren were reluctant to say that Kermit hadsaid something wrong.)

To test for the availability of bothinterpretations of an ambiguous sentence like (9),children judged it twice during the course of theexperiment, once following a situation in which afox stole some chickens from inside a barn (for ti ebackward pronominalization interpretation), andonce following a situation in which a man stolesome chickens while a fox was in a barn (for theextra-sentential interpretation).

Children accepted the backward anaphorareading for all the ambiguous sentences 73 of thetime. The extra-sentential reading was accepted81 of the time, but the difference was notsignificant. Much the same results were obtainedeven for the 7 youngest children, whose ages werefrom 2;10 to 3;4. Only two of the 62 subjectsconsistently rejected the backward anaphorareading. Thus most children find the backwardanaphora reading acceptable, although it mightnot be preferred if they were forced to choosebetween interpretations, as in previouscomprehension studies.

We should note that a variety of controlsentences were also tested to rule nut other, lessinteresting, explanations of the children'sperformance. For example, the children rejectedsentence (10) following a situation in whichStrawberry Shortcake did eat an ice cream, butnot while she was outside playing. This showsthat they were not simply ignoring thesubordinated clauses of sentences in deridingwhether to accept or reject them.

(10) When she was outside playing,Strawberry Shortcake ate an ice cream.

Sentences like (11) were also tested in order toestablish that subjects were not merely givingpositive responses to all sentences, regardless oftheir grammatical properties.

(11) He stole the food when the fox went intothe barn.

The difference between (11) and the acceptablebackward pronominalization in (9) is that in (11)the pronoun is in the higher clause and c-commands the fox, while in (9) the pronoun is inthe subordinate clause and does not c-commandthe fox. (A node A in a phrase marker is said to c-command a node B if there is a route from A to Bwhich goes up to the first branching node above A,and then down to B. Note that c-command is astructure-dependent relation.) There is auniversal constraint that prohibits a pronoun from

c-commanding its antecedent. And indeed thechildren did reject (11) 87% of the time. Note thatthis positive result shows that the children haveearly knowledge not only of the absence of linearsequence conditions on pronominalization, butalso of the existence of structural conditions suchas c-command. (See also Lust, 1981, andGoodluck, 1986.)

2.1.3. Subject/Auxiliary Inversion. Anotherstudy (Crain & Nakayama, 1987) also exploredthe tie between children's errors in acquisitiontasks and sentence processing problems. Thisstudy was designed to test whether children givestructure-dependent or structure-independentresponses when they are required to transformsentences by performing Subject/Auxiliaryinversion. As Chomsky (1971) pointed out,transformational rules are universally sensitive tothe structural configurations in the sentences towhich they apply, not just to the linear sequenceof words.

The procedure in this study was simply for theexperimenter to preface declaratives like (12) withthe carrier phrase "Ask Jabba if ...," as in (13).

(12) The man who is running is bald.

(13) Ask Jabba if the man who is running isbald.

The child then had to pose the appropriateyes/no questions to Jabba the Hutt, a figure from"Star Wars" who was being manipulated by one ofthe experimenters. Following each question,Jabba was shown a picture and would respond"yes" or "no."

The sentences all contained a relative clausemodifying the subject noun phrase. The correctstructure-dependent transformation moves thefirst verb of the main clause to the front of thesentence, past the whole subject noun phrase, asin (14). An incorrect, structure-independenttransformation would be as in (15), whcre thelinearly first verb in the word string (whichhappens to be the verb of the relative clause) hasbeen fronted.

(14) Is the man who is running bald?

(15) *Is the man who running is bald?

For simple sentences with only one clause suchas (16), which are more frequent in a young child'sinput, both versions of the transformation rulegive the correct result.

(16) Ask Jabba if the man is bald.

(17) Is the man bald?

132

Page 134: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Competence and Performance in Child Language 125

It is only on the more complex sentences that theform of the child's rule is revealed.

The outcome was as predicted by the innatenesshypothesis: children never produced an incorrectsentence like (15). Thus, a structure-independentstrategy was not adopted in spite of its simplicityand in spite of the fact that it produces the correctquestion forms in many instances. The findings ofthis study thus lend further support to the viewthat the initial state of the human languagefaculty contains structure-dependence as aninherent property.

The children did make some errors in thisexperiment, and we observed that most of themwere in sentences with a long subject noun phraseand a short main verb phrase, as in (1E;).

(18) Is the boy who is holding the plate crying ?

By contrast, there were significantly fewererrors in sentences like (19), which has a shortersubject noun phrase and a longer verb phrase.

(19) Is the boy who is unhappy watchingMickey Mouse ?

This kind of contrast is familiar in parsingstudies with adults. In particular, Frazier andFodor (1978) showed that a sequence consisting ofa long constituent followed by a short constituentis especially troublesome for the (adult) parsingroutines;4 a short constituent before a long one ismuch easier to parse. The distribution of thechildren's errors in the Subject-AuxiliaryInversion task may therefore be indicative not ofinadequate knowledge of the inversion rule, but ofan adult-like processing sensitivity to interactionsbetween structure and constituent length.

A follow-up study to test this possibility wasconducted by Nakayama (1987). Nakayamasystematically varied both the length and thesyntactic structure of the sentences to betransformed by the children. The children madesignificantly fewer Subject-Auxiliary Inversionerrors in response to embedded questions withshort relative clauses (containing intransitiveverbs) as compared to those with long relativeclauses (containing transitive verbs). With lengthheld constant, the children had more difficultywith relative clauses that had object gaps, as in(20), than with relative clauses that had subjectgaps, as in (21) (although this aect was not quitesignificant).

(20) The ball the girl kicked is rolling.

(21) The boy who was slapped is crying.

The ease of subject gap constructions, ascompared to object gap constructions, has beenfound in a number of other studies in languagedevelopment, in language impaired populations,and in experiments on adult sentence processing(where the question of syntactic competence is notin doubt). It seems reasonable to interpret theseresults as confirming that children's error rates inlanguage tests are highly sensitive to thecomplexity of the sentence parsing that isrequired.

2.2. PresuppositionSyntactic parsing is not the only factor that has

been found to mask knowledge of syntacticprinciples. Test sentences whose pragmaticpresuppositions are unsatisfied in theexperimental situation have been found to resultin inaccurate assessments of children's structuralknowledge. In this section we consider twoexperiments that point to the relevanee ofpresuppositional content in sentenceunderstanding.

The structures we discuss here are relativeclauses and temporal adverbial clauses. A word ofclarification is needed before we proceed. Up tillnow we have restricted the scope of the innatehypothesis to universal constraints (likeSubjacency, and structure-dependence), whichcould not in principle be learned from normallinguistic experience (i.e., without extensivecorrective feedback). But now we want to extendthe innateness hypothesis to a broader class cflinguistic knowledge, knowledge of universal type.'of sentence construction. We cannot plausiblyclaim that every aspect of these constructions isinnate. Rather, every construction will have someaspects that are determined by innate principles,and other aspects that must be learned. And thebalance between these two elements varies fromconstruction to construction. So it is perfectlyacceptable on theoretical grounds that someconstructions should be acquired later thanothers. Howe% er, the innateness hypothesis is notcomnatible with just any order of acquisition. Itprcuicts early acquisition of constructions thatChomsky calls 'core' language, i.e., theconstructions that have strong assistance frominnate principles with just a few parameters to beset by learners on the basis of experience. It wouldbe surprising to discover that knowledge of theseconstructions was significantly delayed once therelevant lexical items had been learned. In theabsence of a plausible explanation, this would putthe innateness hypothesis at risk.

133

Page 135: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

126 Crain and Fodor

We noted in section 1 a range of possibleexplanations of apparently delayed knowledge oflinguistic facts. In the present case they wouldinclude the following:

the construction does not., after all, belong tothe core but is 'peripheral' and hence shouldbe acquired late;

children don't hear this construction until quitelate in the course of language development andso could not be expected to know it exists;

the core principles in question undergomaturation and so are not accessible at earlystages of acquisition;

the experimental data are faulty and childrendo indeed have knowledge of this construction.

We will argue for this last alternative. And just asin the previously described studies of innateconstraints, we will lay the blame for theiliisleading experimental data on the fact thattraditional experimental paradigms do not makesufficient allowance for the limited memory andcomputational capacities of young children. Onceagain, our story is that non-linguit tic immaturitycan create the illusion of linguistic immaturity.

2.2.1. Relative Clauses. Children typicallymake more errors in understanding sentencescontaining relative clauses (as in 22) thansentences containing conjoined clauses (as in 23),when comprehension is assessed by a figuremanipulation (act-out) task.

(22) The dog pushed the sheep that jumpedover the fence.

(23) The dog pushed the sheep and jumpedover the fence.

The usual finding that (22) is more difficult forchildren than (23) up to age 6 years or so has beeninterpreted as an instance of late emergence of therules for subordinate syntax in languagedevelopment (e.g., Tavakolian, 1981). However,though coordination may be innately favored oversubordination, it is also true that subordination isubiquitous in natural language; relative clauseconstructions are very close to the 'core.' Soignorance of relative clauses until age 6 wouldstretch the innateness hypothesis.

Fortunately this is not how things stand.Hamburger and Crain (1982) showed that thesource of children's performance errors on thistask is not a lack of syntactic knowlidge. Byconstructing pragmatic contexts in which thepresuppositions of restrictive relative clauses weresatisfied, they were able to demonstrate mastery

of relative clause structure by children as youngas 3 years. There are two presuppositions in (22):(0 that there are at least two sheep in the context,and (ii) that one (but only one) of the sheepjumped over a fence prior to the utterance. Thereason why previous studies failed to demonstrateearly knowledge of relative clause constructions,we believe, is that they did not pay scrupulousattention to these pragmatic presuppositions. Forexample, subjects were required to act out themeaning of a sentence such as (22) in contexts inwhich only one sheep was present. The poorperformance by young children in theseexperiments was attributed to their ignorance ofthe linguistic pr )perties of relative clauseconstructions. Bai suppose that a child did knowthe linguistic properties, but that he also wasaware of the associated presuppositions. Such achild might very well be unable to relate hiscorrect understanding of the sentence to theinappropriate circumstances provided by theexperiment. Adult subjects may be able to 'seethrough' the unnaturalness of an experimentaltask to the intentions of the experimenter, but it isnot realistic to expect this of young children.

Following this line of reasoning, Hamburger andCrain (1982) made the apparently minor change ofadding two more sheep to the acting out situationfor sentence (22), and obtained a much higherpercentage of correct responses. The mostfrequent remaining 'error' was failure to act outthe event described by the relative clause, butsince felicitous usage presupposes that this eventhas already occurred, this is not really an errorbut is precisely the kind of response that iscompatible with perfect comprehension of thesentence. This interpretation of the data issupported by the fact that there was a positivecorrelation between incidence of this responsetype and age.5

We have conducted another series of studies onrelative clauses, trying several other techniquesfor assessing grammatical competence. In onestudy, we employed a picture verificationparadigm to see if children could distinguishrelative clauses from conjoined clauses, despitethe claim of Tavakolian (1981) that theysystematically impose a conjoined clause analysison relatives. In this study, seventeen 3- and 4-year -olds responded to relative clauseconstructions like (24).

(24) The cat is holding hands with a man whois holding hands with a woman.

(25) The cat is holding hands with a man andis holding hands with a woman.

134

Page 136: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Competence and Performance in Child Language 127

This sentence was associated with a pair ofpictures, one that was appropriate to it and onethat was appropriate to the superficially similarconjoined sentence (25). Seventy percent of the 3-year -olds' responses and 94 of the 4-year-olds'responses matched sentences with the appropriatepicture rather than with the one depicting theconjoined clause interpretation.

A second technique we tried used a 'silliness'judgment task (see Hsu, 1981) to establishwhether children can differentiate relative clausesfrom conjoined clauses. Ninety-one percent of theresponses of the twelve 3- and 4-year-olds testedcategorized as 'silly' sentences such as (26),although sentences such as (27) were accepted assensible 87% of the time.

(26) The horse ate the hay that jumped overthe fence.

(27) The man watched the horse that jumpedover the fence.

Notice that sentence (26) would not be anomalousif the that -clause were misinterpreted as an and -clause, or if it were interpreted as extraposed fromthe subject NP; in both cases, the horse would bethe understood subject of the relative clause. Theresults therefore indicate that most childreninterpret the that-clause in this sentencecorrectly, i.e., as a subordinate clause modifyingthe hay. Informal testing of adults suggests thatthe only respect in which children and adultsdiffer on the interpretatic,n of relative clauses isthat the adults are somewhat more likely toaccept the extraposed relative analysis as well,though even for adults this analysis is much lesspreferred.

A third experiment, on the phrase structure ofrelative clause constructions, indicates thatchildren, like adults, treat a noun phrase and itsmodifying relative clause as a single constituent,inasmuch as they can construe it as theantecedent for a pronoun such as one.

In a picture verification study, fifteen 3- to 5-year -olds responded to the instructions in (28).

(28) The mother frog is looking at an airplanethat has a woman in it. The baby frog islooking at one too. Point to it.

Ninety-three percent of the time the subjectschose the picture in which the baby frog waslooking at an airplane with a woman in it, inpreference to the picture in which the baby frogwas looking at an airplane without a woman in it.That is, the relative clause was included in thenoun phrase assigned as antecedent to thepronoun.

In short: the weight of evidence now indicatesthat children grasp the structure and meaning ofrelative clause constructions quite early in thecourse of language acquisition, as would beexpected in view of the central position of theseconstructions in natural language.

2.2.2. Temporal Terms. Another line ofresearch has yielded support for the claim thatpresupposition failure is implicated in children'spoor linguistic performance. These studiesemployed sentences containing temporal clauseswith before and after, as in (29).

(29) Push the red car to me before/after youpush the blue car.

Clark (1971) and Amidon and Carey (1972) haveclaimed that most normal, 3- to 5-year-olds do notunderstand these sentences appropriat ily. SinceAmidon and Carey established that the childrenwere familiar with concepts of temporal sequence(e.g., as expressed by words like first and last), theimplication is that the structure of these adverbialclauses is beyond the scope of the child's grammarat this age.

However, the acting-out tasks employed in thesestudies were once again unnatural ones whichignored the presuppositional content of the testsentences. Felicitous usage of sentence (29)demands that the pushing of the blue car hasalready been contextually established by thehearer as an intended, or at least probable, futureevent; but this was not established in the. .43

experimental tasks. It is very likely, then, thatthese studies underestimated children's ability tocomprehend temporal subordinate clauses. Forexample, Amidon and Carey reported that fiveand six year old children who were not given anyfeedback frequently failed to act out the actiondescribed in the subordinate clause. Johnson(1975) found that four and five year old childrencorrectly acted out commands such as those in (30)only 51% of the time; again, the predominanterror was failure to act out the action described inthe subordinate clause.

(30) a Push the car before you push thetruck. (S1 before S2)

h After you push the motorcycle, pushthe bus. (After S1, S2)

c. Before you push the airplane, push thecar. (Before S2, Si)

d. Push the truck after you push thehelicopter. (S2 after Si)

Page 137: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

128 Crain and Fodor

Crain (1982) satisfied the presupposition of thesubordinate clause by having the subordinateclause act correspond to an intended action by thesubject, and observed a striking increase inchildren's performance. To satisfy thepresupposition, children were asked, before eachcommand, to choose a toy to push on the nexttrial. The child's intention to push a particular toywas incorporated into the command that wasgiven on that trial. For instance, sentence (30d)could be used felicitously for a child who hadexpressed his intent to push the helicopter.Correct responses (i.e., responses in which boththe main clause and subordinate clause actionwere performed, and in the correct order) wereproduced 82% of the time. Crain's interpretationof these results was that the children's improvedperformance was due to the satisfaction of thepresupposition of the subordinate clause.

However, we now note that the results of thatstudy are open to another interpretation. It maybe that improved performance was not duespecifically to the contextual appropriateness ofthe sentence, but to the fact that the child's taskwas simplified becatiso he was provided with moreadvance information concerning what his taskwould be. In the act-out or 'do-what-I-say'paradigm applied to temporal terms, the childmust discern two aspects of the command: (i)which two toys to move, and (ii) in which order tomove them. If the child has established his intentto move a particular toy, his task involving (i) issimplified. Thus, improved performance may bedue to the satisfaction of presuppositions or it maybe due to the additional information the childpossesses.

Another study was conducted to disentanglethese two factors (Gorrell, Crain, & Fodor, 1989).In this study, there were four groups of subjects.One group, the Felicity Group (F), was givencommands containing before and after with priorinformation about the subordinate clause action,just as in the previous experiment. A secondgroup, the Information oup (I) received priorinformation about the main clause action; notethat this does not satisfy the presupposition of thesentence. There was also a third group, the NoContext Group (NC), who received no advanceinformation at all, and a fourth group, the Felicityplus Information Group (FI), who receivedinformation over and above what would satisfy thefelicity conditions since they chose both actions inadvance. Consider, for example, a subject in the Fgroup. He would be asked to choose a toy to push.

If he chose the bus, for example, a typicalcommand would be (31).

(31) Push the car before you push the bus.

On the other hand, a subject in the I group whohad chosen the bus would be given the command(32).

(32) Push the bus before you push the car.

Fifty-six children participated in the study,ranging in age from 3;4 to 5;10 (mean 4;5). Eachchild was assigned to one of the four groups,which were of equal size and approximatelymatched for age. The 'game' equipment consistedof 6 toy vehicles arranged in a row on a tablebetween the child and the experimenter. Thestimulus set consisted of 12 commands spoken bythe experimenter which the child was to act out.There were three sentences of each of the fourtypes illustrated in (30) above. We were careful tobalance order of choice with order of action andassignment to clause type.

The results showed a significant differencebetween the F and Fl groups on one hand, and theI and NC groups on the other. Table 1 shows thepercentages of correct responses, where a correctresponse consisted of performing both actions inthe sequence specified by the sentence.

Table 1.

Subordinate +

Clause Information

Main Clause Information

+

Fl 80% F 74%

I 51% NC 59%

mean 65% 66%

men

77%

55%

Note that the relevant factor is whethersubordinate clause information was provided inadvance. An analysis of variance confirmed thatthe mere amount of information provided makesno significant difference. The FI group performedbetter than the F group by only 6 percentagepoints, which does not approach statisticalsignificance. And the I group performed just alittle worse (non-significantly again) than the NCgroup.

Although our study was not specificallydesigned to assess age differences, we performed apost hoc breakdown of correct responses by twoage groups: under 4;4, and 4;4 and over. The.oldergroup, as one would expect, performed somewhat

1 6

Page 138: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Competence and Performance in Child Language 129

better than the younger group. What is perhapsmost !nteresting is that the younger group appearto be even more sensitive than the older group tothe proper contextual embedding of utterances.6

A breakdown of the types of errors that occurredreveals that the predominant errors are (i) actingout the mein clause only, and (ii) reversing thecorrect order of the actio- As noted above forrelative clause constructions, acting out the mainclause only is a quite reasonable response giventhat the context failed to satisfy the subordinateclause presupposition. And in fact most of theseerrors were found in the I and NC groups.?Reversals were the most frequent error type in thestudy though they constituted only 19% of allresponses. These errors may reflect a genuine lackof comprehension of either the temporal terms orthe relevant syntactic structure. Howeverno childin either the F or FI groups produced a consistentresponse pattern which would indicate that thiswas the case, so it seems more likely that theseerrors were due primarily just to occasionalinattention.

'.'"te main conclusion we draw from these resultsis that children, from a very young age, are indeedsensitive to the proper contextual embedding oflanguage. Their perfor:. tnce is facilitated bysatisfyir.g the presuppositions of temporalsubordinate clauses, and information which doesnot satisfy the presuppositions does not result infacilitation.

A secondary conclusion is that children doconstruct the appropriate syntactic structure forsentences with embedded clauses. If the childrenin our study had failed to distinguish main fromsubordinate clauses (e.g., by assigningconjunction-type structure to the experimenter'scommands), we would not expect to find thedifference between the F and I groups weobserved. Nor is it plausible to suggest that thechildren relied . on a structure-independentformula of '01 information precedes newinformation.' For example, for the F group, thenew information was always in the main clause. Ifchildren were assuming that old informationwould be first, we would have expected relativelypoor performai a from the F group on sentencesin which the main clause preceded thesubordinate clause. In fact, no such effect wasobserved.

In sum: once again, the linguistic knowledge ofyoung children, when freed of interferinginfluences, appears to be quite advanced. Adultshave the ability to set aside contextual factors in

an unnatural experimental situation, butchildren, with their more limited cognitive andsocial skills, apparently do not have this ability.Consequently, they are highly sensitive topragmatic infelicities. therefore theirlinguistic knowledge can be accurately appraisedonly by tests which include controls to insure thatthey are not penalized by their knowledge ofpragmatic principles.

2.3. PlansAnother possible source of poor performance by

children is in formulating the action plans whichare needed in order to obey an imperative, or actout the content of a declarative sentence whichthey have successfully processed and understood.As we use the term, a plan is a mentalrepresentation used to guide action. A plan maybe simple in structure, consisting of just a list ofactions to be performed in sequence; or it may beinternally complex, with loops and branches andother such structures now familiar in computerprograms.

Formulating a plan is a skill that makesdemands on memory and computationalresources. In certain experimental tasks, thesedemands may outweigh those of the purelylinguisti.. processing aspects of the task. So whenchildren perform poorly, it is important toconsider the possibility that formulating, storingor executing the relevant action plan is the sourceof the problem, rather than imperfect knowledgeof the linguistic rules or an inability to apply themin parsing the sentence at hand.

2.3.1. Pren urinal Modifiers. The first studyon plans that we conducted was in response ty theclaim by Matthei (1982) and Roeper (1972) that 4-to 6-year-olds have difficulty in interpretingphrases such as (33) containing both an ordinaland a descriptive adjective.

(32. the second striped ball

Confronted with an array such as (34), manychildren selected item (ii), i.e., the ball which iesecond in the array and also is striped, ratherthan item (iv) which is the second of the stripedballs (counting from the :eft as the children weretrainee to do).

(34) Array for "the second striped ball"

(i) (ii) (iii) (iv) (v)

1 .3 7

Page 139: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

130 Crain and Fodor

The empirical finding, then, appears to be thatchildren assign an interpretation that is not thesame as an adult would assign to expressions ofthis kind. This difference is attributed by Mattheito children's failure to adopt the hierarchicalphrase structure internal to a noun phrase thatcharacterizes the adult grammar. This structureis shown in (35). Instead, Matthei argues th.atchildren adopt a 'flat structure' for phrases of thiskind, with both the ordinal and the descriptiveadjective modifying the noun directly as in (36).

(35)

DET

ADS

tba aoom

NP

\ADS

striped

N

ball

DET

die

(36)

NP

/ADS

Isewed

I \ADS

Istriped

N

balll

Any divergence between children's and adults'grammars poses a problem from the standpoint oflanguage acquisition theory; namely, explaininghow the child ultimately converges on the adultgrammar without correction or other 'negative'feedback. Fortunately, there is no need to assumean error in the children's grammar in this case, for(here is an alternative component of the languageprocessor in which the errors might have arisen.In a series of experiments, Hamburger and Crain(1984) show that mc- children do assign the adultphrase structure &IQ lo understand the phrasecorrectly as referring to the second of the stripedballs. The difficulty that children experiencearises when they attempt to derive from thisinterpretation a p. .cedure for actually identifyingthe relevant item in the array. An analysis of thelogical structure of the necessary procedure showsit to be quite complex, significantly more so thanthe procedure for "count the striped balls," thekind of phrase Matthei used in a pretest in anattempt to show that children were able to copewith the nonsyntactic demands of the task.

This procedural account of the children's errorsis supported by the sharp improvement inperformance that results from three changes inmethod. One change is the inclusion of a pretasksession iq which the children handle and counthomogeneous subsets of the items which aresubsequently used in an array. This experience isassumed to prime some of the procedural planningrequired in the main experimental task. A second

change in method is to withhold the display whilethe sentence is being uttered, so that formationand execution of the plan are less likely tointerfere with each other. A dramaticimprovement in performance on a phrase like ',33)also results from first asking the child to identifythe first striped ball, which forces him to plan andexecute part of the procedure he will later need for(33). Facilitating the procedural aspects of thetask thus makes it possible for the child to revealhis mastery of the syntax and semantics of suchexpressions.

Hamburger and Crain also found quite directevidence that children do not assign the 'flatstructure' analysis. The standard assumption inlinguistics is that proforms corefer with asyntactic constituent. In the correct structure (35),the words striped and ball form a completeconstituent, but in the incorrect structure (36)they do not. Thus the children should permit theproform one to corefer with striped ball only ifthey have the correct hierarchical structure. Tofind out whether they permit this coreference,they were tested on the instructions in (37), withthe array in (38).

(37) Point to the first striped ball; point to thesecond one.

(38)

(i) (iii) (iv) (v)

Pamburger and Crain found that the childrenconsistently responded to the second instructionby pointing to (v) rather than (iv), showing thatthey wok the proform one to corefer withexpressions ne striped ball. Thus it appears thatthey do know the structure (35).

Finally, we note two experiments by Hamburgerand Crain (1987). The purpose of theseexperiments was to provide empincal support forour claim that response planning 's an importantfactor in psycholinguistic tasks, independently ofsyntax and semantics.

The first experiment attempts to show thatchildren's ability t comprehend a phrase isinversely related to the complexity of thea",sociated plan. For this purpose we comparephrases that are arguably equal to each other insyntactic complexity but differ in plan complexity.Examples are shown in (39), in increasing order ofcomplexity of plans.

(39) i. John's biggest bookii second green bookiii. second biggest book

las

Page 140: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Competence aid Performance in Child Language 1.31

Planning complexity can also be deconfoundedfrom the complexity of semantic constituentprocessing. Note that semantic considerationslead to the prediction that (39i), would be hardest,because its word meanings have to combine in away contrary to the surface sequence of words(= the biggest of John's books).

The pattern of children's responses supports thepredictions of our procedural account, and doesnot conform to the account based on syntactic orsemantic constituency. Children respondedcorrectly to phrases like (39i) 88% of the time.They gave correct answers only 39% of the timefor examples like (39ii), and they were only 17%correct for phrase like (39iii). Thus, thisexperiment provides clear evidence that plans, notlinguistic structures (syntactic or semantic) candetermine processing success and failure foryoung children.

The second experiment addresses the cognitivedifficulty of planning by prefacing the test with asequence of exercises designed to alleviate theplanning difficulty. This activity does not provideany extra exposure to the phrases tested;nevertheless, we anticipate a reduction in errorson these phrases. Consider a phrase such as "thesecond tallest building." This plan requires theinterpreter to identify its referent. The child mustintegrate sequential pairwise comparisons ofrelative size. In the pre-test activity the childwould be shown a display of several objects of onetype (say boxes), but of different sizes, and askedto hand the experimenter the biggest one. Then,once this object was removed from the array, theexperimenter asked the child to perform the taskagain, saying, "Now, find the biggest box in thisgroup." In this way the child would identify thesecond biggest box without ever hearing thephrase "the second biggest box" uttered.Children's comprehension of the phrase wastested before and after the preparatory task. Theygave significantly more correct responses (46%)following the preparatory task than before it (8%in this experiment). This result suggests thattheir difficulty with phrases of this sort stemsfrom the complexity of the response plans.

3. SENTENCE PRODUCTIONTo acquire a language is to learn a mapping

between potential utterances and associatedpotential meanings. Successful mastery shouldreveal itself in both comprehension andproduction. In the previous section we wereconcerned with studies of children'scomprehension, in whi-h their knowledge is tested

by presenting utterances and observing theinterpretations that they assign. We now turn totests of children's competence which proceed inthe other direction: the input to the child is asituation, which has been designed to suggest aunique sentence meaning, and the behavior weobserve is the utterance by which the childdescribes that situation.

It would have been reasonable to expect that thesorts of nonsyntactic problems that presentobstt.Jles for children in comprehension tasksmight prove to be as hard or even harder for themto overcome in production tasks. But we have notfound this to be he case. The results of recentelicited production studies are dramatically betterthan those of comprehension studies directed tothe same linguistit constructions. For example,Richards <1976) elicited appropriate uses of thedeictic verbs come and go from children age 4;0 -7;7, while Clark a: 3 Garnica (1974) reported thateven 8-year-olds didn't consistently distinguishbetween come an? go in a comprehension task.

The disparity between production andcomprehension studies is particularly strikingbecause it is the reverse of what one would expect.To find production superior to comprehension inchildren's language :s as surprising as it would beto find production superior to comprehension inadult second-lan, .age learning, or to find recallsuperior to recognition in any psychologicaldomain. It is play -Ale to argue, therefore, thatthe superiority of production is only apparent, andis due to differences in the sensitivities ofproduction tests and comprehension tests. And thelogic of the situation suggests tbnt it is 0-4:1comprehension tests that are deficient. Aftersuccess is hard to argue with. With suitablecontrols, successful production by children is astrong indicator of underlying linguisticcompetence, as long as their productions are asappropriate and closely attuned to the context asadult utterances are. Because there are so manyways to combine words incorrectly, consistentlycorrect combinations in the appropriate contextsare not likely to come about by accident. On theother hand, failure on any kind of psychologicaltask cannot be secure evidence of lack of therelevant knowledge, since the knowledge may bepresent but imperfectly exploited.

As we saw in the previous section, com-prehension studies seem to be particularlysusceptible to problems of parsing, plan:"ng andso forth which impede the full exploitation oflinguistic knowledge. Production tasks appear .4,be less hampered by thes., extra-grammatical

Page 141: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

132 Crain and Fodor

factors. This is probably because productionavoids non-verbal response planning, which wehave seen is v. major source of difficulty in act-outcomprehension tasks. It is worth noting also thatin constructing contexts to elicit particularutterance types, we have no choice but to attend tothe satisfaction of the presuppositions that areassociated with the syntactic structures inquestion, because otherwise the subjects won'tutter anything like the construction that is beingtargeted. In elicited production it is delicatemanipulations of the communicative situationthat give one control over the subject's utterances.

3.1. Relative Clauses. In section 2 wepresented evidence of young children's competencewith relative clauses. Further confirmation wasobtained by Hamburger and Crain (1982), usingan elicited production methodology. Pragmaticcontexts were consti acted in which thepresuppositions of restrictive relatives weresatisfied. It was discovered that children as youngas three reliably produce relative clauses in thesecontexts.

A context that is uniquely felicitous for arelative clause is one which requires the speakerto identify to an observer which of two objects toperform some action on. In our experiment, theobserver is blindfolded during identification of atoy, so the child cannot identify it to the observermerely by pointing to it or saying !his 1 that one.Also, the differentiating property of the relevanttoy is not one that can be encoded merely with anoun (e.g., the guard) or a prenominal adjective(e.g., the big guard) or a preposttonal phrase (e.g.,the guard with the gun), but involves a morecomplex state or action (t..g., the guard that isshooting Darth Vader). Young children reliablyproduce meaningful utterances with relativeclauses when these felicity conditions are met. 1. orexample:

(40) Jabba, please come over to point to the onethat's asleep. (3;5)Point to the one that's standing up. (3;9;Point to the guy who'll going to get killed.(3;9)

Point to the kangaroo that's eating thestrawberry ice cream. (3;11)

Note that the possibility of imitation is excludedbecause the experimenter takes care not to useany relative clause constructions in the elicitationsituation. This technique has now been extendedto younger children (as young as 2;8), and to theelicitation of a wider array of relative clause

constructions, including relatives with object gaps(e.g., the guard that Princess Leia is standing on).

3.2. Passives. Bo. and Wexler (1987) haveargued that A-chains, which are involved in thederivation of verbal passive constructions, are notavailable to children in the first few years.8 Borerand Wexler maintain that knowledge of A-chainsis innate, but becomes accessible only after thelanguage faculty undergoes maturational change.We were not convinced, however, that thismaturation hypothesis is necessitated by the facts.Rather, the facts seem to be consistent with A-chains being innate and accessible from theoutset.

One source of data cited in support of themature pion hypothesis is the absence of fullpassives in the spontaneous speech of youngchildren. But this of course is not incontrovertibleevidence that children's grammars are incapableof generating passives. Full passives are rarelyobserved in adults' spontaneous speech either, orin adult speech to children. But their paucity isnot interpreted in this case as revealing a lack ofgrammatical knowledge. Instead, it is understoodas due to the fact that the passive is a markedform which it Is appropriate to us i in certaindiscourse con`exts, in most contek. .ie active isacceptable and more natural, or a reduced passivewithout a by-phrase is sufficient. That is, theabsence of full verbal passives in adult speech isass.tmed to be a consequence of the fact that it'sonly in rare situations that the full passive isuniquely feAcito,ts. But the same logic thatexplains why adults produce so few full passivesmay apply equally to children. Perhaps they toohave knowledge of this construction, but do notase it except where the communicative situation isappropriate.

We have tested this possibility in an experimentwith thirty-two 3- and 4-year-old children. (Crain,Thornton, & Murasugi, 1987). One experimenterasked the child to pose questions to anotherexperimenter. The pragmatic context wascarefully controlled so that questions containing afull verbal passive would be fully appropriate. Thefollowing protocol illustrates the elicitationtechnique:

Adult See, the Incredible Halt is hitting one of thesoldiers Look over here. Darth Vader goes overand hits a soldier. So Darth Vader is also hittingone of the soldiers. Ask Keiko which one.

Child to Keiko: Which soldier is getting hit by DarthVadcr?

1.40

Page 142: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Competency and Performance in Child Language 133

Note that the child knows what the correct answeris to his question, and that he cannot expect toelicit this answer from his interlocutor (Keiko)unless he includes the by-phrase. In fact, exactly50% of responses were passives with full by-phrases). Of course, active constructions are alsofelicitous in this context (e.g., Which soldier isDarth Yoder hitting'?), even though the contextualcontrast with another agent (the Incredible Hulk)may tend to favor the passive stylistically. Andindeed 31% of responses were active questionswith object gaps. The other 19% of responsesincluded mostly sentences that were grammaticalbut not as specific as the context demanded (e.g.,passive lacking by-phrases).

Using this technique, we were able to elicit fullverbal passives from all but three of the thirty-twochildren tested so far, including ones as young as3;4. Some examples are shown in (41).

(41) She got knocked down by the Smurfie. (3;4)Which girl is pushing, getting pushed by acar? (3;8)He got picked up from her. (3;11)It's getting ate up from Luke Skywalker.(4;0)

Which giraffe gets huggen by Grover? (4;9)

Note that these utterances contain a variety ofmorphological and other errors, but they allnevertheless exhibit the essential passivestructure (underlying subject in pre-verbalposition; agent in post-verbal prepositionalphrase).9 It might be argued that the children'spassives elicited in this experiment do not involvetrue A-chains. However, since they are just likeadult passives (disregarding morphologicalerrors), the burden of proof falls on anyone whoholds 'hat adult passives involve A-chains andchildren passives do not. No criterion has beenproposed, as far as we know, which distinguishesadult's and children's passives in this respect. Forexample, it is true that the children almost alwaysuse a form of get in place of the passive auxiliarybe, but get is .acceptable in adult passives also.(Get is more regular and phonologically moreprominent than forms of be, and this may be whyit is more salient for children.)

Children's considerable success in producingpassive sentences appropriate to thecircumstances (i.e., their correct pairing ofsentence forms and meanings) constitutescompelling evidence of their grammaticalcompetence with this construction. Comparison ofthese results with the results of testing the samechildren with two comprehension paradigms (act-

out and picture-verification) confirms that, likespontaneous production data, these measuresunderestimate children's linguistic knowledge.

The finding that young children evince masteryof the passive obviates the need to appeal tomaturation to account for its absence in earlychild language. Maturation cannot of course beabsolutely excluded; but a maturation account ismotivated only where a construction is acquiredsurprisingly latewhere this means later tha,.would be expected on the basis of processingcomplexity, pragmatic usefulness in children'sdiscourse, and so forth. (Also, as noted in section1, some important cross-language and cross-construction correlations noed to be established toconfirm a maturational approach; see Borer &Wexler, 1987, on comparison of English passiveswith passive and causative constructions inHebrew.) The elicited production results suggestthat the age at which passive is acquired inEnglish falls well within a time span that iscompatible with these other factors, and somaturation does not need to be invoked.

3.3. Wanna contraction. Another phenomenonthat can be shown by elicitation to appear quiteearly in acquisition is wanna contraction inEnglish. The facts are shown in (42) and (43).

(42)a. Who do you want to help?

b. Who do you warmly help?

(43)a. Who do you wait to help you?

b. Who do you wanna help you?

Every adult is (implicitly) aware thatcontraction is admissible in (42b) but not (43b).However, on the usual assumption that childrendo not have access to 'negative data' (i.e., are notinformed of which sentences are ungrammatical)it. is difficult to see how this knowledge about theungrammaticality of sentences like (43b) could beacquired from experience (at any age). So this isyet another candidate for innate linguisticknowledge. (What is known innately would bethat a trace between two words prevents themfrom contracting together. The relevant differencebetween (42b) and (43b) is that in (43b) the who isthe subject of the subordinate clause and has beenmoved from a position between the want and theto. The trace of this noun phrase that is leftbehind blocks the contraction. In (42b). bycontrast, the trace is in object position after help,and therefore is not in the way of the contraction.)

Crain and Thcrnton (in press) used the elicit lproduction technique to envourage children to .. ..questions that would reveal violations like (43b) if

14 1

Page 143: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

134 Crain and Fodor

these were compatible with their grammars. Thetarget productions were evoked by having childrenpose questions to a rat who was too timid to talkto grown-ups. The details of the procedure areillustrated in the following scenarios:

Protocol for Object Extraction

Experimenter. The rat looks hungry. I bet 11,.; wants toeat something. Ask him what.

Child: What do you wanna cat?

Protocol for Subject Extraction

Experimenter. One of these guys gets to take a wal.c,one gets to take a nap, and one gets to cat acookie. So one gets to eat a cookie, right? AskRatty who he wants.

Child: Who do you want to eat the cookie?

Using this technique, questions involving bothsubject and object extraction were elicited from 21children, who ranged in age from 2;10 to 5;5, withan average age of 4;3. The preliminary findings ofthe experiment are clearly in accord with theexpectations of the innateness hypothesis,although we must verify our own subjectiveassessment of these data using a panel ofjudges.1° In producing object extraction questions(which permit contraction in the adult grammar),children gave contracted forms 59% of the timeand uncontracted forms 18% of the time. (Therewere 23% of other responses not of the targetform, such as What can you eat to see in the ,lark?)By contrast, children's production of subjectextraction questions (where contraction is illicit)contained contracted forms only 4 of the Lime anduncontracted forms 67% of the time (with 29% ofother responses).

The systematic control of this subtle contrastcould perhaps have been shown on the basis ofspontaneous production data, but the crucialsituations (particularly those that call for subjectextraction questions) probably occur quite rarelyin children's experience, just as they do in the caseof the full passive. So it is not easy to gather datain sufficient quantity for statistical analysis. Bycontrast, the elicitation technique is obviously anefficient way of generating data, and thusfacilitates testing for early acquisition of a varietyof constructions relevant to the innatenesshypothesis.

4. CONCLUSIONIn this paper we have reviewed a great many

empirical studies. The thread that ties themtogether is the idea that, when performance

problems are minimized in testing situations,children show early knowledge of a wide range ofbasic constructions. As early as 1965, Bellugisuggested that children's errors on Wh-questionswere due, not to a lack of knowledge of do tworelevant transformations (Wh-movement andSubject/Auxiliary Inversion), but to a not yet fullydeveloped capacity to apply both rules in the samesentence derivation. Our work extends thisgeneral idea to a broader set of linguisticphenomena. Our particular emphasis has beenconstructions which linguistic theory predictsshould require little or no learning because theyinvolve principles which are universal and henceinnate. Our findings suggest that the innatenesshypothesis for language is still secure even in itssimplest form (in which different innate principlesare not, timed to mature at differentdevelopmental stages). Maturation ofnonlinguistic abilities appears to be sufficient toaccount for the time course of linguisticdevelopinant.

REFERENCESAmidon, A., & Carey, P. (1972). Why five-year-olds cannot

understand before and after. Journal of Verbal Learning and VerbalBehavior,11, 417-423.

Bellugi, U. (1965). The development of interrogative structures inchildren's speech. In K Riegel (Ed.), The development of languagefunctions. University of Michigan Language nevelopmentProgram, Ann Arbor, Report No. 8.

Borer, 1-'... & Wexler, K. (1987). The maturation of syntax. In T.Roeper & E. Williams (Eds.), Parameter setting. Dordrecht,Holland: D. Reidel Publishing Company.

Chomsky, N. (1971). Problems of knowledge and freedom. New York:Pantheon Books.

Chomsky, N. (1986). Knowledge of language: Its nature, origin, anduse. New York: Praeger.

Clark, E. V. (1971). On the acquisition of the meaning of before andafter. Journal of Verbal Learning and Verbal Behavior, 10, 266-275.

Clark, E. V., & Carnica, 0. K. (1974). Is he coming or going? Onthe acquisition of deictic verbs. Journal of Verbal Learning andVerbal Behav:Or,15, 559-572.

Crain, S. (1982). Temporal terms: Mastery by age five. Papers andReports on Child Language Development, 21, 33-38.

Crain, S., & Fodor, J. D. (1984). On the innateness of Subjacency.Proceedings of the Eastern States Conference on Linguistics (Vol. 1).Columbus: Ohio State University.

Crain, S., & McKee, C. (1985) Acquisition of structural restrictionson anaphora. Proceedings 4 the North Eastern Linguistic Society,16. Amherst: University of Massachusetts.

Crain, S., & Nakayama, M. (1987). Stricture-dependence ingrammar formation. Language, 63,522-543.

Crain, S., Thornton, R., & Murasugi, K. (1987). Capturing theevasive passive. Paper presented at the 12th Annual BostonUniversity Conference on Language Development.

Crain, S., & Thornton, R. (in press). Recharting the course oflanguage acquisition: Studies in elicited production. In N.Krasnegor, D. Rumbaugh, R. Schiefelbusch, & M. Studdert-Kennedy (Eds.), Biobehavioral foundations of languagedevelopment. Hillsdale, NI.: Lawrence Erlbaum Associates.

142

Page 144: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Competence and Performance in Child Language 135

Frazier, L, & Fodor, J. D. (1978). The sausage machine: A newtwo-stage parsing .model. Cognition, 6,291-325.

Good luck, FL (1986). Children's interpretation of pronouns andnull NPs: Structure and strategy. In P. Fletcher & M. Carman(Eds.), Language acquisition: Studies in first language development(2nd ed.). Cambridge, MA: Cambridge University Press.

Good luck, H., & Tavakolian, S. (1982). Competence andprocessing in children's grammar of relative clauses. Cognition,8,389-416.

Correll, P., Crain, S., & Fodor, J. D. (1989). Contextual informationand temporal terms. Journal of Child /Argues*, 16,623-632.

Hamburger, H., & Crain, S. (1982). Relative acquisition. In S.Kuctsj (Ed.), Language development (Volume 11, pp. 245-274).Hillsdale, NJ: Lawrence Erlbaum Associates.

Hamburger, H., & Crain, S. (1984). Acquisition of cognitivecompiling. Cognition,17,85-136.

Hamburger, H., &Crain, S. (1987). Plans and semantics in humanprocessing of language. Cognitive Science, 11,101 -136.

Hsu, J. R. (1981). The development of structural principles related tocomplement subject interpretation. Doctoral dissertation, The CityUniversity of New York.

Jakubowicz, C. (1984). On markedness and binding principles.Proceedings of the Northeastern Linguistic Society, Amherst,Massadtusetts.

Johnson, H. (1975). The meaning of before and after for preschoolchildren. journal of Experimental Child Psychology,19.

Kimball, J. (1973). Seven prindples of surface structure parsing innatural language. Cognition, 2,15-47.

Lasnik, H., & Crain, S. (1985). On the acquisition of pronominalreference. Lingua, 65,135-154.

Lust, B. (1981). Constraint on anaphora in child language: Aprediction for a universal. In S. Tavakolian (Ed.), Languageacquisition and linguistic theory (pp. 74-96). Cambridge, MA: MITPress.

Matthel, E. M. (1981). Children's interpretations of sentencescontaining reciprocals. In S. Tavakolian (Ed.), Languageacquisition and linguistic theory (pp. 97-115). Cambridge, MA:MIT Press.

Matthel, E. M. (1982). The acquisition of prenominal modifiersequences. Cognition, 11,301-331

Nakayama, M. (1987). Performance factors in subject-auxinversion by children. Journal of Child Language,14,113-125.

Otsu, Y. (1981). Universal grammar and syntactic development inchildren: Toward a theory of syntactic development. Unpublisheddoctoral dissertation, M.I.T.

Phinney, M. (1981) The acquisition of embedded sentences andthe NIC. Proceedings of the North Eastern Linguistic Society, 11.Amherst University of Massachusetts

Richards, M.' .. (1976). Come and go reconsidered: Children's useof deictic verbs in contrived situations. Journal of VerbalLearning and Verbal Rehavior,15,655-665.

Roeper, T. W. (1972). Approaches to a theory of language acquisitionwith examples from German children. Unpublished doctoraldissertation, Harvard University.

Roeper, T. W. (1986). How children acquire bound variables. In b.Lust (Ed.), Studies in the acquisition of anaphora, Volume I.Dordrecht, Holland: D. Reidel Publishing Company.

Solan, L (1983). Pronominal reference: Child language and thetheory of grammar. Dordrecht, Holland: D. Reidel PuMishingCompany.

Solan, L, & Roeper, T. W. (1978). Children's use of syntacticstructure in interpreting relative clauses. In H. Coodluck & L.Solan (Eds.), Papers in the Structure and Development of Child

Language. UMASS Occasional Papers in Linguistics (Vol. 4, pp.105-126).

Tavakolian, S. L (1978). Children's comprehension of pronominalsubjects and missing subjects in complicated sentences. In H.Coodluck & L Solan (Eds.), Papers in the Structure andDevelopment of Child Language. UMASS Occasional Papers inLinguistics (Vol. 4, pp. 145-152).

Tavakolian, S. L (1981). The conjoined-clause analysis of relativeclauses. In S. Tavekolian (Ed.), language acquisition and linguistictheory (pp. 167-187). Cambridge, MA: MIT Press,.

de Villiers, J. C., & de Villiers, P. A. (1986). The acquisition ofEnglish. In D. Slobin (Ed.), The crosslinguistic study of languageacquisition Volume I: The data Hillsdale, NJ: Lawrence ErlbaumAssociates.

Wexler, K., & Chien, Y. (1985) The development of lexicalanaphors and pronouns. Papers and Reports on Child LanguageDevelopment. Stanford, CA. Stanford University.

FOOTNOTES'language and cognition: A developmental perspective (in press).Norwood, NJ: Ablex.

tAlso University of Connecticut, Storrs.ttAlso Graduate Center, City University of New York.'Susan Carey has pointed out (Boston UniversityConference,1988) that the linguistic maturation hypothesis predicts thatknowledge of a linguistic principle should correlate withgestation age rather than with birth age in children bornprematurely. Unfortunately, variability is probably such thatno dear correlation could be expected to show itself of age 4 or5, when passives and other relevant syntactic constructionsareclaimed to emerge.

2For full details of procedure and results of this experiment, andof all other studies reported in this paper, we refer renews tothe original publications.

3As far as is known at present, no natural language exhibits thisblanket prohibition against backward pronominalization; seediscussion in Lasnik and Crain (1985). This suggests that it isnot a possible constraint in a natural language grammar, inwhich case it should not be entertained by childrenat any ageor stage of acquisition (unless one assumes linguisticmaturation).

4The awkwardness of the prosodic contour for (18), with itsheavy juncture before the final word, may indicate that thiskind of construction is also an unnatural one for the sentenceproduction routines.

51n reviewing the literature on relative clauses, de' tillers andde Villiers (1986) suggest that if earlier work had counted theassertion-only response as correct, children would have beenseen to perform better there too. This objection is unwarranted,for two reasons. First, responses of this type did not appear inother studies, presumably because these studies failed to ..lootthe presuppositions of the restrictive relative clause. Moreimportant, in the Hamburger and Crain study this responsewas not evinced by any of the 3-year-old children, andaccounted for only 13% of the responses of the 4-year-olds.Nevertheless, even the 3-year-olds acted out sentences withrelative clauses at a much higher rate (69%) of success than inearlier studies.

6These results should be interpreted with caution due to thesmall and unequal number of subjects in each subgroupleading to rather uneven data. For example, the older Fgroupperformed relatively poorly compared to the younger F group,

143

Page 145: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

136 Crain and Fodor

though doser analysis reveals that this is due to the poorperformance of just one child (4;4) in the F group.

Mere were no main-clause-only errors in the Fl group. For theF group, 12 of the 16 main-clause-only errors (out of 168responses) are due to one child.An A-chain is the assodation of a trace with a moved nounphrase in an A-position (= argument position such as Subject).For example, in The bagel was eaten by Bill there is an A-chainconsistinir ` the bagel and its associated trace after eaten.

9The proper reversal of underlying subject and object orderoccurred even when the task was complicated by animplausible scene to be described. For example, the sentenceOne dinosaur's bell :g eared from the ice cream cone was used todescribe a situation in which the dinosaur was indeed beingeaten by the ice aearn, not vice versa.

loin preliminary evaluation of the audio tapes, we have found itunexpectedly easy to distinguish children's contracted andnon-contracted forms in most cases.

1 4,:: 4

Page 146: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Haskins Laboratories Status Report on Speech Research1989, SR-991100, 137-147

Cues to the Perception of Taiwanese Tones*

Hwei-Bing Lint and Bruno H. Repp

A labeling test with synthetic speech stimuli was carried out to determine to what extentthe two dimensions of fundamental frequency (Fo), height and movement, and syllableduration provide cues to tonal distinctions in Taiwanese. The data show. :iat the high levelvs. and level tones and the high falling vs. mid falling tones can be reliably distinguishedbl Fo height alone, whereas the distinction between tones with dissimilar contours, suchas the high falling and low rising tones, is predominantly cued by Fo movement. However,the other dimension of Fo may collaborate with the dominant one in cueing a tonalcontrast, depending on the extent to which the two tones differ along that dimension.Syllable duration has a small additional effect on the perception of the distinction betweenfalling and nonfalling tones. These results are consistent with previous findings in tonelanguages other than Taiwanese in that they suggest that tones are mainly cued by Fo.While the primacy of Fo dimensions as cues to tonal contests depends on the contrast tobe distinguished, the present findings show that tones which nominally differ only inregister (e.g., high falling vs. mid falling, y exhibit perceptually relevant contourdifferences, ar i vice versa.

INTRODUCTION

The primary acoustic attribute distinguishinglinguistic tones is their fun:313,mental frequency(F0), although duration and amplitude of thesyllable carrying the tone may also exhibitcharacteristic differences. This observation raisesthe question of whether, in perceiving a phonemictone, listeners integrate all these acoustic cues, c,whether they pay attention to Fo alone.

Several studies have investigated the role of Foversus other properties as cues to tonaldistinctions, in both synthetic and natural speech.For example, Abramson (1962) imposed artificialFo movements on natural Thai monosyllables bymeans of a N ocod er and found that Fooverpowered other concomitant features such as

This research, which formed part of the first author'sdoctoral dissertation, was supported by NICHD Grant HD -01994 to Haskins Laboratories. Special thanks are due toArthur Abramson, Carol Fowler, and Ignatius Mattingly fortheir helpial comments on an earlier version of this paper, andto Jackson Gandour and Eva Girding for serving as reviewersfor Language and Speech.

duration and amplitude in cueing tonaldistinctions. The primacy of Fo was confirmed in alater study using synthetic Thai speech(Abramson, 1975), though addition of naturalamplitude contours improved identificationfurther. The conclusion that Fo carries sufficientinformation for conveying tonal distinction, hasalso been drawn by Howie (1976), Tseng (1981),and M.-C. Lin (1987), who investigated the tonesof Mandarin Chinese. However, these studieseither did not vary duration and amplitude at all,or they pitted these dimensions againstunambiguous Fo contours. In whisperedmonosyllables, where Fo is altogether absent butduration and amplitude differences may beretained to some extent, tonal distinctions aicresolved poorly, though above chance level (e.g.,iioramson, 1972; Howie, 1976). It is conceivablethi,t, in addition to increasing the naturalness ofutterances (cf. M.-C. Lin, /987; Rumyantsev,1987), duration and amplitude have larger effectson tone identification when Fo providesambiguous information. Also, the relativeinformativeness of different acoustic cues for tonalidentity may vary across languages.

137 145

Page 147: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

138 Lin and Rem

The Fo dimension itself may be decomposed intotwo aspects: height and movement.' The relativeimportance of these two aspects obviously dependson the nature of the tonal distinction to be made:If the distinction is between two tones differingprimarily in register (e.g., Thigh" vs. low"), Foheight will be important; if two tones differprimarily in contour (e.g., "rising" vs. Fomovement will be the dominant cue. However, fortones that, according to linguistic nomenclature,differ only in register or in contour, the otheraspect of Fo might play a secondary role in cueingthe contrast. Furthermore, for tones that differ inboth register and contour (e.g., Thigh falling" vs.low rising"), both Fo height and movement maybe relevant, though perhaps not equallyimportant. Their relative :mportance may dependon what other tones there are in the language.

With these issues in mind, we conducted thepresent study to determine the relativeimportance of Fo height, Fo movement, andduration as cues to the tonal distinctions ofTaiwanese. From traditional classifications byphonologists and from acoustic studies (Chiang,1967; H.-B. Lin, 1988; Zee, 1978) it is evident theTaiwanese has five long tones, whose typical Focontours are illustrated in Figure 1.2 They fallinto two classes: Tones 1 ("high level") and 5 ("midlevel") are fairly level or static, whereas tones 2("high falling"), 3 ( "mid falling"), and 4 ( "lowrising") are contoured or dynamic. Two pairs oftones, 1 vs. 5, and 2 vs. 3, have similar Focontours but differ in register. We would thusexpect Fo height to play a primary role in thedistinction of these tone pairs. As for the tonalpairs with different Fo contours, we expected thatFo movement rather than height would beresponsible for cueing the differences. However,we wondered whether the "other" aspect of Fowould make a contribution to a distinction as well.For example, would the small difference in Fomovement between the two level tones, 1 and 5, orthe larger one between the two falling tones, 2 and3, play any role in perception at all? And would Foheight be relevant to the distinction, for instance,between tones 1 and 4, even though they differ incontour? The answers to these questions seemedless obvious.

With respect to duration, the five Taiwaneselong tones fall into two groups: relatively long(tones 1, 4, and 5) and relatively short (tones 2and 3). In isolated syllables, the respectiveaverage durations were 145 ms and 75 ms (H.-B.Lin, 1988; see Figure 1). Thus, in principle,

duration could be a rather strong cue for thedistinction between falling and nonfalling tones.

30z8cc H 26

(1)

o.ff

w e) 22

0z180

50 100 150

DURATION (ms)

Figure 1. Average Fo movements of five Taiwanese tonesproduced by three male speakers on the syllable /do/ insentence-final position. (Data from EL-B. Lin, 1988.)

Our approach was to synthesize a variety of Fopatterns by varying Fo height, Fo movemer',,, andduration independently in isolated syllables.3 Insome of our stimuli, the Fo heights, movements,and durations of typical Taiwanese tones werejuxtaposed in novel combinations, so the relativestrength of these competing cues could beassessed. In addition, we synthesized Fo patternsintermediate between those of the original tonesin terms of Fo height and/or movement. Theserelatively ambigLous stimuli provided the bestopportunity to observe effects of secondary cues,such as duration or the "other" Fo dimension, onperception of a given tonal contrast. Even thoughour stimuli were presented singly, weconceptualized the study in terms of pairwisetonal contrasts, for heuristic convenience.Listeners, of course, always had all five lexicalalternatives in mind as they tried to identify oursynthetic syllables.

METHODSMaterials

The syllable /do/ was modelled after a naturalTaiwanese utterance on the software serialformant synthesizer at Haskins Laboratories.Since the Taiwanese /d/ is unaspirated andvoiceless (i.e.. [ti), only a 10-ms release burstpreceded the onset of voicing. The onset

Page 148: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Cues to the Per ion o Taiwanese Tones 139

frequencies of the first three formants were450 Hz, 1160 Hz, and 2400 Hz. After 70 ms, theyreached respective steady states of 560 Hz, 760Hz, and 2000 Hz. The amplitude of the voicingsource '-as kept at a constant value. The extremeFo values and durations of the five basic tones areshown in Table 1. The actual Fo movementsfollowed the natural models (cf. Figure 1) asclosely as possible.

Table 1. Average Fo onset and offset valises (in liz) anddurations (in ms) of the five synthetic tones on Idol. The"pivot" indicates the turning point of the Fomovementin tone 4.

Tone onset pivot offset duration

1 130 129 1452 150 96 753 109 80 754 102 94 105 1455 113 107 145

To assess the relative importance of Fo height,Fo movement, and duration cues in the perceptionof tonal distinctions, these properties were variedindependently within each pairwite contrast.Thus we synthesized, in addition to the originaltones, stimuli in which the Fo movement of tone Xwas combined with the Fo height of tone Y, andvice versa. This involved a translation of thewhole Fo movement up or down the linearfrequency axis, by adding a positive or negativeconstant to all Fo values in the synthesisspecifications. We defined Fo height operationallyas the onset frequency of a tone.4 In addition, wecreated an F. movement intermediate betweenthe two original tonal contours by averaging theirFo values, and we chose an intermediate Fo onsetfrequency as well. Thus we had three Fomovements (X, Y, and intermediate) and three Foheights (X, Y, and intermediate), all combinatk,nsof which resulted in nine stimuli for any giventonal contrast. The stimulus set for the tone 2-4contrast is illustrated in Figure 2; tone 24 denotesthe stimulus intermediate between tones 2 and 4in both Fo height and Fo movement.5

Given five original tones, there were 10 pairs oftonal contrast. In four of these (2-3, 1-4, 1-5, 4-5),the durations of the two original tones were thesame, and so all nine stimuli were synthesized atthe same duration. In the six other contrasts (1-2,1-3, 2.4, 2-5, 3-4, 3-5), the two original tones haddifferent durations (cf. Table 1). For these, we

synthesized the set of nine stimuli at threedifferent durations, the two original ones (75 and145 ma) and one intermediate one (110 ma), whichresulted in 27 stimuli. When the duration of anoriginal Fo movement was changed, its onset andoffset frequencies were maintained, but the Fotrajectory was stylized by linear interpolationbetween the extreme values. In the case of tone 4,the location of the pivot was moved in proportionto the duration change, and two linearinterpolations were performed.

.1.4- 160

c.)z8 120ccLL

z 80

zU. 40

0 50DURATION (%)

100

Figure 2. Example of a stimulus set for a particular tonalcontrast: tone 2 versus tone 4. The original (2 and 4) andintermediate (24) Fo movements are shown by the heavylines; the other patterns were obtained by changing theonset frequencies of these three Fo movements. (See alsoFootnote 3.)

In all, 6 X 27 + 4 X 9 = 198 stimuli were createdon the synthesizer, though a number of these wereidentical or closely similar to each other.6 Thestimuli were recorded in five differentrandomizations, each on a separate audio tape. Oneach tape, there were six blocks of 33 stimuli,separated by 8 sec of silence. There was a 3.5 secinterstimulus interval within blocks. Before thepresentation of the test tapes, subjects had achance to familiarize themselves with the originalsynthetic tones on the /do/ syllable. The ord-r oftest tapes was counterbalanced across subjects,and the experimental session took about 90minutes.

SubjectsFour male and four female listeners were

recruited from University of Connecticut and YaleUniversity graduate students and were paid fortheir participation. All were native speakers ofTaiwanese and reported to have no history ofspeech or hearing disorder. Like all educatedTaiw.nese, they were also fluent in English andMandarin Chinese.?

147

Page 149: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

140 Lin and :c

ProcedureThe stimuli were presented binaurally over

headphones at a comfortable listening level.Listeners were instructed to identify each /do/syllable as (1) 'city,' (2) 'gambling,' (3) 'envy,' (4)'map,' or (5) 'surname' by writing down theLumber of the choice. The response choices werelisted on the answer sheet in Chinese characters.Subjects were told not to leave any blanks.

RESULTSThe results for the 10 tonal contrasts will be

discussed in the following order: First, the twocontrasts that have nominally identical contoursbut differ in register (1-5, 2-3); then, two contraststhat have similar Fo movements and differ in Foheight (4-5, 1-4); finally, the remaining sixcontrasts which have very dissimilar Fomovements (as well as differences in duration),grouped into three pairs exhibiting differentmagnitudes of Fo height difference (2-1, 2-5; 5-3,2-3; 3-4, 2-4). In naming the members of a tonalpair, the one with the higher onset frequency isalways named first (hence 2-1 and 5.3).

Tables 2 to 11 show the response distributionsfor the stimuli pertaining to each of the 10 pairs oftonal contrast In each table, stimuli are coded interms of Fo movement and height with two-digitnumbers standing for the intermediate values onthese wo dimensions (e.g., 15 is intermediatebet% the original tones 1 and 5). To simplifythe tables, the results have been averaged acrossstimuli differing in duration; effects of stimulusduration will be discussed later. Thus, eachstimulus set includes nine types. The averagerecognition rate for the five synthetic syllablesmodelling the original tones was 91% correct,which confirms that these stimuli weresatisfactorily synthesized (cf. Footnote 3).

Tones with the same contour, differing inregister

Presumably, the more pronounced the differencein an acoustic property between two tones, themore important it will be as a cue to the tonaldistinction. If, moreover, differences along otherpsychoacoustic dimensions are small, it willemerge as the dominant cue. Thus, for pairs suchas tones 1 vs. 5 and 2 vs. 3, which nominally havethe same Fo contour but differ in register, Foheight was expected to be the dominant cue.However, tones 2 and 3, at least, also exhibit adifference in Fo movement (see Figure 1), and wewondered whether that dimension wouldcontribute to the perceptual listinction.

Tone 1 vs. tone 5. Tones 1 (high level) and 5(mid level) have almost identical, flat Fomovements; the difference between them is in Foheight. The date in Table 2 reveal that Fo heightindeed plays a primary role in the perception ofthis distinction. Movement 1 with height 5 wasidentified predominantly as tone 5, whereasmovement 5 with height 1 was classified as tone 1.There were virtually no confusions with othertones. Did Fo movement have any cue value at all?The stimuli with intermediate Fo height, whichshould have been the most sensitive indicators ofFo movement effects, suggest a negative answer.However, movement 1 with height 5 received only70 tone 5 responses, whereas the originalsynthetic tone 5 received 90. This differencenotwithstanding, the effect of Fo movement on theperception of this tonal distinction seemsnegligible simply because the difference betweenthe tones is minimal on this dimension.

. ame h. nzgn tevet tone (I) vs. nua level tone p;.

StimuliMov ligt 1 2

Responses (%)3 4 S

1 1 so 2)15 57.5 25 4)5 30 70

15 1 &) 25 17.515 62.5 37.55 2.5 2.5 95

5 1 93 25 7315 523 25 45 73 23 90

Tone 2 vs. tone 3. The falling tones 2 and 3nominally have the same Fo contour, though Fo intone 2 falls somewhat more steeply than tone 3.The main difference between them. is :It F' height.The data in Table 3 confirm that Fo height is themajor cue for the distinction: Movement 2 withheight 3 was identified mostly as tone 3, andmovement 3 with height 2 as tone 2. However,there was also some effect of Fo movement: Ateach of the three Fo heights, more tone 3responses were obtained when the movementderived from tone 3 rather than from tone 2. Thus,even though both tones are considered merely*falling' in traditional phomological terminology,there is in fact a perceptually relevant differencein Fo movement between them. The difference inFo height, however, is clearly the dominant cue.

148

Page 150: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Cues to the Perception of Taiwanese Tones 141

tame 3. tugnfatung tone (z) vs. awaiting tone (3).

StimuliMov H t 1 2

Responses ()3 4

2 2 92.5 7.523 82.5 17.5

3 22.5 72.5

23 2 823 15 2323 83 173 2.5

3 17.5 80 2.5

3 2 73 65 27323 673 3233 73 92.5

Tones with similar contours, differing inregister

The contour of tone 4, referred to here as"rising", is actually a complex falling-rising ordipping Fo movement with a relatively limitedrange (cf. Figure 1). Because of that limited range,it looks somewhat similar to the flat contours oftones 1 and 5, though to the ear it may be quitedissimilar. In c3ntrasting tones 5 vs. 4 and 1 vs. 4,which differ in both Fo register and contour, weexpected both dimensions of Fo to be perceptuallyelevant. The question was which of them would

carry more weight.Tone 4 vs. tone IL Tones 4 (low rising) and 5

(mid level) are relatively similar in Fo movementduring the first half of their durations, but duringthe second half tone 4 moves up and almostmerges with tone 5. There is also a difference in Fo height, tone 4 having a lower onsetthan tone 5. However, the data in Table 4reveal Fo movement to be the primary cue:

Table 4. Mid level tone (5) vs. low rising lone (4).

StimuliMov H t 1

Responses (%)2 3 4

5 5 5 5 9054 u 75 904 2.5 97.5

54 5 2.5 65 32.554 25 70 2734 95 5

4 5 10054 23 9734 100

High intelligibility of both tones was maintainedwhen their original Fo movements were presentedat uncharacteristic heights. An effect of Fo heightemerged only when Fo movement wasintermediate. Evidently, Fo movement is thedominant cue to this distinction. This is alsosuggested by the finding that tone 4 responsespredominated for the intermediate Fo movement:Detection of even a slight "dip* was sufficient toelicit tone 4 percepts.

Tone 1 vs. tone 4. Since tone 1 (high level) isvery similar to tone 5 in Fo movement but higherin register, the Fo movement difference between tones 1 and 4 is similar to that betweentones 5 and 4, just discussed, only the differencein height is larger. Does this imply a larger role ofFo height in cueing the distinction? Table 5suggests an affirmative answer, but it is eviden.that Fo movement is still the dominant cue:

tame s. ntgn levet one (I) vs. tow rising tone (4).

Stimuli Responses (%)Mov Hgt 1 2 3 4 5

1 1 82.5 17.514 25 754 2.5 23 10 85

14 1 75 223 2.514 73 67.5 254 100

4 1 15 82.5 2.514 23 973A 100

Movement 4 with height 1 was still predominantlyidentified as tone 4, and movement 1 with height4 was identified as tone 5, which shares thecontour with tone 1. A small effect of Fo height isevident with the original tone 4 movement: At avery high Fo, some tone 1 responses did occur. Alarge effect of Fo height was obtained for theintermediate movement stimuli, whoseidentification changed from tone 1 to tone 4 as theheight was lowered. This shift is larger than thatobserved in Table 4, in accord with the largerheight difference for the present contrast.Tones with dissimilar contours

The falling Fo contour (tones 2 and 3) isacoustically and perceptually very dissimilar fromthe level and rising contours; it is also carried on ashorter syllable, though differences in durationwill be ignored for the time being. Since Fo

149

Page 151: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

142 Lin and Repp

movement proved to be perceptually importanteven for distinguishing tones with relativelysimilar contours, we certainly expected that tonalcontrasts involving falling tones would beprimarily cued by Fo movement, with secondarycontributions of Fo height depending on theamount of the difference. In fact, it will be seenthat, because for each falling tone and each leveltone in Taiwanese there is a similar tone differingin register, stimuli with altered Fo height wereoften identified as tones other than those in theparticular contrast under consider- tion.

Tone 2 vs. tone 1. Tones 2 (hig_i falling) and 1(high level), though they are both nominallyThighs, do show a difference in onset frequency.The m-at striking difference, however, is in theirFo movements. As the data in Table 6 show, Fomovement is indeed the dominant cue to thecontrast: Movement 2 with height 1 was stillmostly identified as tone 2, whereas movement 1with height 2 was recognized as tone 1. Thestimuli with the intermediate Fo movementshowed some effects of Fo height on theiridentification as tone 1, but the effect was suchthat tone 1 responses increased as the height wasraised, even though the higher onset frequencyderived from tone 2; there was no effect of heighton tone 2 responses. The paradoxical effect of Foheight derived from a tendency to identify theintermediate Fo movement as tone 3, whichindeed has a contour intermediate between tones2 and 1, and occasionally even as tone 5; both ofthese tendencies to give mid-register tonert. "nses increased as Fo was lowered, at theexpense of tone 1 responses. Basically, Fo heightseems to be irrelevant to the perception of the tone2 vs. 1 contrast.

cable o. mign fatting tone (Z) vs. 'sign levet tone (I).

StimuliMov Het 1 2

Responses (%)3 4 S

2 2 96.6 3321 95 5

1 87.5 123

21 2 25.8 55 183 0.821 12.5 61.6 22.5 33

1 6.6 59.1 25.8 8.3

1 2 96.6 1.6 1.6

21 983 1.6

1 92.5 0.8 6.6

Tone 2 vs. tone 5. The difference in Fomovement between tones 2 (high falling) and 5(mid level) is the same as that between tones 2and 1, just discussed, but the difference in Foheight (onset) is much larger. The data presentedin Table 7 show that predictable confusionsoccurred as Fo height was changed: Movement 2with height 5 was often identified as tone 3, whilemovement 5 with height 2 was obviously tone 1.Stimuli with intermediate movement werepredominantly identified as falling tones (tones 2or 3, depending on Fo height), though at a veryhigh Fo (above the characteristic height of tone 1)some tone 1 responses occurred. In general,however, Fo movement remained the overridingcue for this falling versus level tone distinction.

Table 7. High falling tone (2) vs. mid level tone (5).

&EmilMov Hat 1 2

Responses (%)3 4 S

2 2 96.6 3325 833 14.1 C8 1.6

5 34.1 60 5.8

25 2 273 51.6 17.5 1.6 1.6

25 4.1 61.C' 283 0.8 55 32.5 60.8 1.6 5

5 2 973 1.6 0.825 86.6 23 10.8

5 6.6 0.8 5 873

Tone 5 vs. tone 3. Here we have another levelversus falling contrast, at a lower Fo height.Tones 3 (mid falling) and 5 (mid level) originate atalmost the same frequency, so no effect of Foheight was expected. The data in Table 8 confirmthat the dominant cue for the tone 5 versus tone 3distinction is Fo movement, though the stimuliwith the intermediate movement do show a smalleffect of Fo height. The copies of the original toneswere not identified very well in this pair of tones;this reflects in part their inherent confusabilitywith tones having the same contour (tones 1 and2, respectively) but, in addition, the neutralizationof duration cues and stylization of Fo movementsmay have increased .e number of confusions.This general ambiguity may have made listenersextra sensitive to small differences in Fo height.

150

Page 152: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Cues to the Perception of Taiwanese Tones 143

Table 8. Mid level tone (5) vs. mid falling tone (3).

StimuliMov Hitt 1 2

Responses ( %)3 4 5

5 5 11.6 1.6 33 OS 82353 41 58 1.6 8833 1.6 7.5 2.5 88.3

53 5 1.6 10 49.1 14 37.553 6.6 65.8 27.53 7.5 65.8 26.6

3 5 233 70.8 5.853 15.8 83 0.8 333 20 67.5 12.5

Tone 1 vs. tone 2. Tones 1 (high level) and 3(mid falling) not oni:, . ?iffier in Fo movement, as dotones 5 and 3, but also in Fo height. As can beseen in Table 9, subjects' responses changedconsiderably with laoth charges in Fo movementand in Fo height. Hcvver, changes in height ledto pre lictable "confusions': Movement 1 (1?, 'el)with 3 (mid) was largely identified as tone5 (mid level), whereas movement 3 (falling) withheight 1 (high) was most often classified as tone 2'high falling). These, responses thus occurredmerely reflect the fact that Fo height cues the tone1 vs. tone 5 and tone 2 vs. tone 3 distinctions. Theintermediate movement stimuli, however, doreveal a genuine effect of Fo height on thecontrast between tones 1 and 3: Tone 1 responsesdecreased and tone 3 responses increased asheight was lowered, with an increase in tone 5confusions in the middle. Thus it appears thatbot' Fo h.,ight and movement are important cuesfor the distinction between tones 1 and 3, just aswe expected.

1 Rule v. nign level tone (1) vs, miajaritng tone tit.

Stimuli Responses ( %)Mov H t 1 2 3 4 5

1 1 85 1.6 1.6 11.613 633 33 3333 5 0.8 6.6 9.1 78.3

13 1 31.6 = 123 0.3 31.613 6.6 12. 283 52.53 73 56.6 0.8 35

3 1 33 69.1 24.1 3313 45 50 53 1. , 81.6 3.3

Tone 3 vs. tone 4. There is a Wilting differencein Fo movement between tone 3 (mid falling) andtone 4 (low rising). However, the difference in Foonset is ..nail. The data in Table 10 confarn thatFo movement is the primary cue for thedistinction between these tones: Small changes inFo height left the responses to the original Fomovements unchanged. The stimuli with theintermediate movement did show an effect of Foheight, despite the relatively small physicaldifferences involved. However, the effect was lesson identification of these stimuli as tones 3 or 4,but primarily on their identification as tone 5 (midlevel). Indeed, the intermediate movemt -it wasrelatively flat and thus could be mistaken for themid lite when its height was raised. For thetone s vs. tone 4 distinction, therefore, Fo heightseems to be of little importance.

Table 10. Mid falling tone (3) vs. low rising tone (4).

StimuliMov H 1

Responses ( %)2 3 4 5

3 : 14.1 80.8 534 9.1 85 5.84 10 833 6.6

34 3 1.7 1.7 23.3 5 68334 1.7 0.8 38.3 233 35.84 0.8 37.5 33.3 283

4 3 33 96.63, 5 954 2..5 97.5

Tone 2 vs. tone 4. Tones 2 (high falling) and 4(low rising) show the sharpest Fo movementcontrast of any tone pair, as well 83 the largestdifference in Fo height. As can be seen in Table11, movement 2 with height 4 was identified astone 3 (mid falling), which 's not surprising.Interestingly, however, movement 4 with height 2was identified as tone 1 (high level), even thoughmovement 4 with height 1 was not so identified(see Table 5). Thus, the movement barrierbetween tones 1 and 4 can be overcome by asufficient raising of Fo height. The intermediatemovement stimuli (somewhat falling with a dip)were never labeled as tone 4, but were highlyambiguous at a hgh Fo and perceived as tone 3 ata low Fo. Clearly, both Fo movement and heightare important for the tone 2 versus tone 4distinction, though the actual weights of thesecues are difficult ro gauge because of the intrusionof other responAs.

151

Page 153: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

144 Lin and Repp

Table 11. High fallinh ::.4te (2) vs. low rising tone (4).MEIIM-:

StimuliMov Hgt 1 2

Responses (%)3 4 5

2 2 02 % 41

MIMI.

24 723 25 234 11.6 82.5 5.8

24 2 373 283 202 13.324 02 32.5 44.1 22.54 3.3 86.6 10

4 2 86.6 1.6 02 10.824 73 41 88.34 0.8 4.1 95

Overview of Fo effectsIn the preceding diacuLsion of pairwise tonal

contrasts, stimuli included in one particularsubset were often also relevant to the perceptionof other tones, as evidenced by the variousresponse intrusions. It is useful, therefore, tosurvey the pattern of predominant identificationresponses for the complete set of stimuli, sti'ldisregarding variations in duration. Table '.2provides such an overview. In its columns, Foheights are arranged in terms of decreasing onsetfrequencies, and in its rows Fo movements are

ordered in terms of decreasing differences betweenthe onset and offset frequencies (with one minor,deliberate reversal). Each original height(movement) occurred with 9 i'fferent movements(heights), whereas each intermediate height(moement) occurred with 3 different movements(heights). For eac)' :. .sight- movement combination,Table 12 lists eli tonal categories with more than20 of responses, in rank order.

The first five columns show that strongly fallingFo movements were perceived as either tone 2 ortone 3, depending on Fo height. The secondaryrelevance of Fo movement to this distinction canbe seen in the fact that tone 3 responses increasedas the steepness of the Fo movement decreased.The next four columns show that moderatelyfalling Fo movements were perceived as tone 3 ortone 5 at the lower Fo onsets; stimuli with highonsets were not well sampled here, but suggesttone 1 responses with tone 2 as the second choice.The next three columns show that shallow fallingFo movements (with an onset-offset difference of 6Hz or less) were invariably identified as leveltones: as tone 1 or as tone 5, dependin7 )n Foheight. Finally, the last three columns show that,when the curvature of tone 4 is imposed on a flatFo movement, the stimuli were mostly heard astone 4. This salient Fo movement cue wasoverridden only '.0, very h. 7,11 absolute Fo values,which favored tone 1 percepts.

Table 12. Predominant response categories (with percentages exceeding 20%) for all combinations of Aro moveme-itsand heights. Numbers in pareneieses represent onset frequencies (for Fo height) and the difference between onset andoffset frequencies 'for Fo movement).

Fo heightType Onset (Hz)

Fo movementType and onset-offset difference (Hz)

21225

(150)(140)(131.5)

2(45)

222

23(413)

2

25

(30)

2,1

2,3

3

(29)

2,3

12

(27.5)

2,1

23

24(255)

1,2,3

35

(17.5)13

(15)34

(13)5

(6)

1

15

(3.5)1

(1)

1

45

(1.5)14

(-1)

4(-3)

1

1 (130) 2 2,3 2,3 1.5.2 1,4 423 (129.5) 2 2 2.324 (126) 2,3 3,2,5 415 (121.5) 13 13 1,513 (119.5) 3,2 1314 (116) 5,1 45 (113) 3,2 32 32 3.5 5 5 5,1 4,5 4

35 (lil) 3 3,5 53 (109) 3,2 3 3,5 3,5 5,3 5 5 4

45 (167.5) 5 4,5 434 (105.5) 3 33 44 (102) 3,4,5 5 5 4 4 4

1 "2

Page 154: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Cues to the Perception of Taiwanese Tones 145

Duration as a cue

Ana Ily, we consider the possible role of durationas a cue to tonal distinctions. Because the twofalling tones, 2 and 3, are characterized by shorterdurations than the other Ones ',see Table 1), ashort stimulus duration was expected to be a cueto the falling contour category. In addition,shortening the duration of a falling Fo movementincreases its slope, which may fu.ther enhance the'falling' percept. This may favor tone 2 over tone'3 responses for clearly falling contours,considering the steeper slope of the tone 2movement (cf. Figure 1).

The results were analyzed by comparing theresponse percentages for the short (75 ins) andlong (145 ms) durations of each stimulus whoseduration was varied. (The intermediate durationsware not considered.) Table 13, which is arrangedin the same way as Table 12, lists all responsecategories in which a change of more than 10occurred as stimulus duration was shortened.('The largest change was 35.) Plus signs indicateincreases, minus signs decreases, in order ofabsolute magnitude. The changes are oftencomplementary for two tonal categories, with oneresponse increasing at the expense of another.

Our expectations were that falling tone (2 and 3)responses would generally increase and level tone

(1 and 5) responses would decrease as aconsequence of stimulus shortening, but that forstrongly falling Fo movements tone 2 responsesmight increase at the expense of tone 3 responses.The overall pattern of changes supports the firstprediction: There were 11 increases in tone 2responses versus 3 decreases, 13 increases in tone3 responses versus 3 decreases, one increase intone 5 responses versus 12 decreases, and noincrease in tone 1 responses versus 3 decreases.The second, more specific prediction was less wellsupported: In the left half of Table 13, increasestone 2 responses are frequen' , but there are alsosome decreases, and changes in tone 3 responsesare inconsistent, though usually complementaryto the changes in tone 2 responses. On the whole,it appears that duration did have a role as asecondary cue for the falling-nonfalling tonedistinction.

DISCUSSIONFrom these results, it is quite clear that Fo is

the most prominent perceptual cue for tonalcontrasts in Taiwanese, as in all other tonelanguages studied so far. The present data furthershow that either dimension of Fo (height ormovement) can emerge as the dominant factor incueing a particular tonal contrast, dependingon the tonal patterns to be differentiated.

Table 13. Response categories showing changes in excess of 10 following a change in stimulus duration from 145 to75 Ins. All relevant combinations of Fo heights and movements are shown.

Fo heightType Onset (Hz)

Fo movementType and onset-offset difference (Hz)

2(45)

25

(30)3

(29)12

(27.5)24(25.5)

35

(17.5)13

(15)34

(13)5

(6)

1

I (1)

I 4(-3)

2 (150) +2 -1,0203412 (140) -2,+3 +2e325 (131.5)

1 (130) +2 +2 +2,3 +2 -5,+20323 (129.5)24 (126)

+2.-5 40315 (121.5) -13 (119.5)

-10514 (116)5 (113) +3,-2 +2 +2 -- -503 -535 (111) -2, +3 -503 4.,3 (109) -503 -5,+3 -5+3 -504 A +345 (107.5)

34 (105.5)-54 (102)

+3

153

Page 155: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

146 Lin and Repp

In general, the more pronounced an acousticdifference between two tones is, the more likely itwill be an important cue in perceiving thecontrast. Thus, Fo height was found to be a maincue in the distinction between tones with highlysimilar contours but different registers, whereasdistinctions between tones with dissimilarcontours were mainly cued by Fo movement.

On the whole, Fo movement seems to beperceptually more important than Fo height forTaiwanese listeners. This is especially evident inthe contrasts between tones 1 and 4, and tones 5and 4, which in pruzciple could have depended onFo height. One reason for this finding may be thatFo movement is a more £ table dimension than Foheight, which varies across speakers and is oftenambiguous when no contextual reference isprovided. (Note that tones 2 and 3, and tones 1and 5, tend to be confused in isolated syllables.)However, it has also been observed that lingui,:ticbackground can affect the perception .f Fopatterns (Gandour, 1978, 12-3). Thus it could bethat Fo height and movement are given differentweights in languages with different tonalinventories, even when similar tonal contrasts arebeing perceived. For instance, in Cantonese, Foheight is rather important because four out of sixtones are relatively similar in Fo movement(Vance, 1977). On the other hand, all four tones ofMandarin are dissimilar in Fo movement (Howie,1976). Taiwanese represents an intermediate case.Ganelour (1983) found that Cantonese speakersrely more heavily on Fo height, and less on Fomovement, than do speakers of Mandarin andTaiwanese when judging various Fo patterns. Ourfinding of the relative importance of Fo movementin the perception of Taiwanese tones is notinconsistent with Gandour's findings.

One purpose of perceptual studies such as thepresent one is to go beyond linguisticnomenclature, which omits acoustic detail, andbeyond phonetic studies, which describe but do notestablish the perceptual relevance of thesedetailed aspects. Thus we have shown that theTaiwanese high falling and mid falling toms,which nominally differ only in registo., alsoexhibit some perceptually relevant difference's inFo. movement. We have also shown that thestriking difference in duration between falling andnonfalling tones can provide distinctiveinformation in ambiguous cases. Bailey andSummerfield (1980), in their thorough studies ofthe perception of segmental phonetic distinctions,have argu ed that any systematic difference inacoustic properties between segments can be

shown to be perceptually relevant. Thisgeneralization also seems to apply to theperception of tonal distinctions.

REFERENCESAbramson, A. S. (1962). The vowels and tones of stoulard Thai:

acvustical measurements and experiments. Bloomington: IndianaUniversity Research Ce.,.ter in Anthropology, Folkk re. andLinguistics.

Abramson, A. S. (1972% Tonal experiments with whispered Thai.In A. Valdman (Ed.), Papers it linguistics and phonetics to thememory of Pierre Delettre (pp. 31-44). The Hague: Mouton.

Abramson, A. S. (1975). The tones of Central Thai: Someperceptual experiments. In 3. G. Harris & J. R. Chamberlain(Eds.), Studies in Tai Linguistics in honor of William 1. Cedney(pp. 1-16). Bangkok: Central Institute of English Lar ,cage.

Bailey, P. J. & Summerfield, Q. (1980). Informed- :g in speech:Observations on the perception of [91-stop dusters. Journal ofExperimental Psychology: Human Perception and Performances, 6,536-563.

Chiang, H. T. (1967). Amoy-Chinese tones. Phonetics, 17, 100-115.Candour, J. (1978). Mt_ yerception of tone. In V. A. Fromitin (Ed ),

TOM: A linguistic survey (pp. 41-76). New York: Academic Fres&Candour, J. (1983). Tone perception in Far Eastern languages.

Journal of Phonetics, 11, 149475.Howie, J. M. (1976). Acoustical studies of Mandarin rowels and tones.

Cambridge, UK Cambridge University Pre-..Lin, K-B. (1988). Contextual stability of Tai ne tones. Doctoral

dissertation, University of Connecticut, St. s.Lin, M.-C. (1987). The perceptual cues of tones in standard

Chinese. Proceedings of the Eleventh International Congress ofNor-tic Sciences, Vol. 1 (pp. 162-165). Tallinn, Estonia, US.S.R.:Acat4emy of Sdenoes of the Estonian SSR.

Rumyantsev, M. K. (1987). Chinese tones and their duration.Proceedings of the Eleventh International Congress of PhoneticSciences, Vol. 1. (pp. 166-169). Tallinn, Estonia, U.S.S.R.:Academy of Sdences of the Estonian SSR.

Tseng, C Y. (1981). An acoustic phonetic study of tones in MandarinChinese. Doctoral dissertation, Brown University, Providence,RI.

Vance, T. J. (1977). Tonal distinctions in Cantonese. Phonetiat 34,93407.

Zee, E. (1978). Duration and intensity as correlates of Fo. Journal ofPhonetics, 6, 721-225.

FOOTNOTES'Language and Speech, 32, 25-44 (1989).tAlso Department of Linguistics, University of Connecticut,Storrs. Now at the Lexington Center, Inc., Jackson Heights, NewYork.

tThroughout this paper, we refer to Fo characteristics ofphonological tones as Fo register and contour, but to thecorresponding phonetic dimensions as Fo height and movement.Register and contour thus are characterized in discrete terms(high, mid, low; rising, level, falling), whereas height andmovement are continuously variable and are described inacoustic terms. The term "register" is not intended to denotechanges in vocal register.

2According to traditional classification, Taiwanese has five longtones and two short tones. The former occur in syllables endingwith vowels or nasals, whereas the latter occur in checkedsyllables only (see Chiang, 1967). Because only an open syllablewas used as the carrier in the present stuey, short tones wereexcluded ham consideration.

3We did not vary amplitude characteristics of the stimuli, to keepthe design within bounds. To confirm that Taiwanese tonescould be identified in isolated syllables, tones produced on the

154

Page 156: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Cues to the rer ion o Taiwanese Tones 147

syllable /do/ by a native speaker "ere presented to nativelisteners for identification in c pilot st Although confusionsoccurred between tones 2 and 3, and wean tones 1 and 5,average performance wee 87 correct.

4As a result, variations in Fo height were restricted for the tone 3-5and 4-5 contrasts. Alternatively, we could have chosen the Fomidpoint or the average Fo as our measure of Fo height. In thatcase, however, height variations would have been restricted :orthe tone 1-2, 2-3, and 4-5 contrasts. Our choice of Fo onsetfrequency as the relevant measure is consistent withphonological terminology for tones, which usually mentionsonset register and direction of contour, such as "high falling".

5Figure 2 actually shows a tonal pair of different original

durations, whose Fo movements have been scaled to a commonduration. The figure is slightly inaccurate because it showscurvilinear Fo movements for both original tones. In reality, aftertwo temporally contrasting tones had been scaled to a commonduration, one or both of the Fo movementswere linear.

6For example, each original tone occurred four times because itoccurred in four different stimulus subsets (i.e., in contrast withfour other tones). The slightly different response percentages tophysically identical stimuli in different tables (below) derivefrom the fact that each stimulus subset was treated separately inthe data analysis.

Note that /do/ is not a possible syllable in Mandarin, so secor.cl-language interference seemed unlikely.

155

Page 157: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Haskins Laboratories Status Report on Speech Research1989, SR-991100, 148-161

Physical Interaction and Association by Contiguity inMemory for the Words and Melodies of Songs*

Robert G. Crowder,t Mary Loufse Serafine,t and Bruno H. Repp

Six experiments investigated two explanations for the integration effect in memory forsongs (Serafine et al., 1984, 1986). The integration effect is the finding that recognition ofthe melody (or text) of a song is better in the presence of the text (or melody) with which ithad been heard originally than in the presence of a different text (or melody). Oneexplanation for this finding is the physical interaction hypothesis, which holds that onecomponent of a song exerts subtle but memorable physical changes on the othercomponent, making the latter different from what it would be with a different companion.Experiments 2, and 3 investigated the influence that words could exert on the subtlemusical character of a melody. A second explanation for the integration effect is theassociation-by-contiguity hypothesisthat any two events experienced in close temporalproximity may become connected in memory such that each acts as a recall cue for theother. Experiments 4, 5, and 6 investigated the degree to which both successive andsimultaneous presentation's of spoken text with hummed melody would give rise toassociation of the two components. The results gave encouragement for both explanationsand are discussed in terms of the distinction betwaen encoding specificity and independentassociative bonding.

Stimuli obviously have multiple features. Twoexamples are that ordinary objects have both colorand shape and that songs have both melody andtext. Questions about memory representations ofthese theoretically set arable but a ,ininglyrelated components of a songmelody and textmotivated our earlier investigations (Serafine,Crowder, & Repp, 1984; Serafine, Davidson,Crowder, & Repp, 1986). We hypothesized chat asong might be represented in memory in threeways: (1) independent storage of components (theseparate entities perceived and sto-0-1 so thatmemory for one is uninfluenced by the other); (2)hoi«;tic storage (the two components sc thoroughlyconnected in perception and memory that one isremembered only in the presence of the other);and (3) integrated storage (the two components

Thir research was supported by NSF Grant GB 86 08344 toR. Crowder and by NICHD Grant HD01994 to HaskinsLaboratories. The authors are grateful for the assistance ofWilliam Flack in testing subjects and to Shari R. Speer fordiscussion of larlier versions of this paper.

148

related in memory such that one component isbetter recognized in the presence of the other thanotherwise). The holistic hypothesis is obviouslyfalse in the general case since people oftenrecognize the melodies of familiar songs s :kenthey tear them performed on solo instruments, orwith unfamiliar verses. What this informalobservation leaves open, however, is whether thememory representation consists of independent orintegrated components.

In earlier studies we reported evidence for whatwe called an integration effect in memory formelody and text. Using a recognition task, wefound that melodies were better recognized whenheard with the same words (as originally heard)than with different words, even when the differentwords fit the melody and were equally familiar tothe subject. Similarly, we found that the words ofsongs were better recognized in test songscontaining the original melody than in thosecontaining a different but equally familiar melody.The procedure we employed was as follows:Subjects heard a serial presentation of up to 24

156

Page 158: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Physical Interaction and Association by Contiguity in Memory for the Words and Melodies of Songs 149

unfamiliar folksong excerpts, each heard onlyonce. A recognition test followed immediately, inwhich subjects were typically asked to indicate,for each excerpt, whether they had heard exactlythat melody (or text) before, ignoring the currenttext (or melody). The test excerpts consisted of oldsongs (exactly as heard in the presentation) andvarious types of new songs (for example, oldmelody with new words), including a type wetermed mismatch songsthat is, an old melodywith old words thai. had been paired with adifferent melody in the original presentation. Thecritical comparison, of course, was betweenmelody recognition when old songs were heardand when mismatch songs were heard, that is,when the melody was paired with its originalcompanion as opposed to a different, but equallyfamil. Ar, companion. This comparison, then,avoided the potentially biasing effect thatcompletely new, unfamiliar words would have onrecognition .)f a truly remembered melody. Whatwe have termed the integration effect is thefinding that both melody and text recognitionwere better in the c se of old songs than inmismatch songs. We concentrated on thefacilitating effect of identical words on recognitionof melodie because recognition of words was insome cases almost at ceiling.

The effect was robust. It was not eliminated byinstructing the subjects, on their initial hearing,to focus attention on the melody (and ignore thewords), nor was it eliminated by hearing adifferent singer on the recognition test than hadsung in the original presentation (Serafine et al.,1984). Moreover, the effect was not due to aparticular experimental artifact, the potentiallyconfusing effect of hearing f Ile melody withseemi:zly "wrong" words; the wrong words did notmake melody recognition suffer, as against anappropriate baseline, but the right wordsfacilitated it (Serafine e' al., 1986).

The integration was not accounted for by asemantic connotation imi.lsed on the melody bythe meaning.; of the words (Serafine et al., 1986)because the integration effect was found even insongs employing nonsense syllables onpresentation and test. A melody heard only once,then, was better recognized in the presence of itsoriginal nonsense text than with different butequally familiar nonsense. This latter observationseems inconsistent with a meaning-interactionhypothesis thai. might have considerable intuitiveappeal.

In the present studies, we explore further thesource of the integration effect. Two hypotheses,

not necessarily incompatible, are under test here,the physical interaction hypothesis and theassociation -by- contiguity) hypothesis. The first ofthese asserts that when a song is sung, the wordsimpose subtle effects upon the melody notes,slightly affecting their acoustic properties such asthe onsets, durations, and offsets. We have termedthese effects "submelodic" because they wouldleave unaffected the pitches as they would benotated or conceived in composition. For example,some words might impose a staccato articulationand otit,rs a legato phrasing. If this hypothesiswere correct, then a melody sung with oneparticular text would in fact be a somewhatdifferent melody than it were when sung withanother text. It would not be surprising to findthat the melody were then better recognizedwiththe same words both times than with change?.words. A similar argument could be made fortexts.

The association-by-contiguity hypothesis assertsthat two events that occur in close temporalproximity (contiguously or simultaneously) tend tobe associated in memory, though neither wasnecessarily changed by virtue of having enteredinto this association. If this hypothesis werecorrect, then in the limit text and melody would bejust as well associated if they were experiencedsimultaneously but separately (e.g., words spokenand hummed melodies) as if they were given as asong.

In the present research, the first threeexperiments addressed the submelodic hypothesis(a special case of the physical-interactionhypothesis in which words Wei:. musicalproperties of the melody) and the latter threeexperiments addressed the association hypothesis.All experiments employed our us. al generalprocedure: Subjects heard folksong excerptsfollowed immediately by a melody recognition testwhere test items contained controlledcombinations of song components. All experimentsused variations of the musical materials anddesign described below.

General Method

Musical materials were based on 40 Americanfolksongs (from Erdei, 1974, see Serafine et al.,1984, 1986) which, in earlier experiments, wefound were virtually all unfamiliar to our subjects.There were 20 pairs of song excerpt', each pairselected so that melodies and texts wereinterchangeable, having rhythmic compatibility.Figure 1 shows such a pair.

1 5 7

Page 159: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

150 Crowder et al.

Interchangeability of melodies and texts wascrucial to the construction of test items in which asong contained a different melody or text thanthat heard originally in the presentation. Thus,each text contained a stress pattern suitable foreither melody, and both texts within a paircontain-4 the same number of syllables. The

Melody

A

is

Text

b

exceptions were Song Pairs 11 and 17, where onetext was shorter by one syllable and required thecommon "slur" across two tones (see "sleep" inFigure 1, Melody B). Given interchangeablecomponents, each pair potentially yielded fivetypes of test items examples of which are shownin Figure 2:

When the

Hush a-

train comes a-long. When the train comes a- long.

bye, don't you cry, go to sleep lit- tie babe.

Hush a- bye, doh't

When the train comes

you cry, go to sleep lit -tie babe.

a- long.When the train cones a-long.

Figure 1. Sample pairs of songs with interchangeable texts. (Aa and Bb denote original songs; Ab and Ba denotederivatives.)

SAMPLE PRESENTATION ITEMS

l' Just a poor ray- ter log $-.r.

Here rows a blue- bird through -Pr-- via- deo.

Hold err mule While I dance Jo-dey.Hold my mull while l dears.

Who's that

y had ba- by. 0 Lord.

tap- ping at the via- dot.?

Ha- ms buy as chin -.y doll. Its- as buy me a chin-.y doll.

SAMPLE TEST ITEMS

Ons year a- go both Jack sad Has MK sell ' ha boom.

What sill OS do With the Otd We' hide

Hold my mule Wills 1 dente Jo -day,Hold ary mule *gig I dance.

d MgWho's that tsp-plag at the elm- day?

Ha- as buy as a chin -my doll. Ha- as by as a chin -ey doll.

Figure 2. Sample presentation and test items (Code for test items: a - new melody/new words, b - old melody/newwords, c- new melody/old words, d- old melody/old words,mismatched, e- old songs).

Page 160: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Physical Interaction and Association by Contiguity in Memory for the Words and Melodies of Songs 151

The five types were a - new melody/new words,b - old melody/new words, c- new melody/oldwords, d- old melody/old words, tn. matched, e- oldsongs.

In addition, test items might consist of old ornew melodies alone, that is, hummed melodieswithout words. The present experiments employedthree to five types of test items, but in each casethe critical comparison was between old songs andmismatch songs, where the latter items allowed usto test recognition of one component in thepresence of a different component that hadnevertheless been heard in the presentation andwas equally familiar.

The Tongs were sung in the alto range by thesecond author, recorded onto a master tape anddubbed onto sets of experimental tapes with a 5-sec interval of silence between presentation itemsand a 10-sec response interval after each testitem. A silent metronome set at one beat per secfacilitated performance at an even tempo, and apiano tone (not heard by the subjects) ensuredpitch accuracy at the start of each song.

Subjective tempos across the son ;s were notuniform, however, due to normal rhythmic andmetric variations (e.g., "double time"). Thepresented songs ranged in total duration fromabout 4 to about 10 sec, with a mean of 6.4 andstandard deviation of 2.01. All songs used C as thetonic, although there were variations in mode(Dorian, major and minor), and starting tone.Only slight alterations were made in the originalfolk melodies or texts (e.g., "across" changed to"cross"), in order to ensure rhythmicinterchangeability of materials.

The same general design was used in allexperiments. The presentation and test sequencesalways utilized the song pairs in the same order.On the presentation tapes, half the songs weremelodies with their original folksong texts (typeAa in Figure 1) and half used the borrowed,interchangeable text (type Ab in Figure 1). Eachmismatch item on the test tapes required twosongs in the presentation sequence (since themelody of one would be tested with the text of theother). Whenever two such songs occurred in thepresentation, they followed one anotherimmediately on the tape. Natural sources ofvariation among these songs include length,nature of the melody, tempo, and subjtct matter ofthe text, to name only a few characteristics. Thesefacto:s were completely controlled, however, bycounterbalancing across different subjects groups.

Experiment 1The aim of Experiment 1 was to test the

submelodic hypothesis by employing, on the testtape, songs that contained different texts but thesame phonetic and prosodic pattern as thoseemployed on the original presentation. We derivedphonetically similar texts by translating each textinto a corresponding nonsense text, where vowelswere left intact and consonants were changed to areasonably close phonetic neighbor (/b/=/g/; /k/=/t/,etc.). The presentation consisted of songs withnonsense texts, and the test consisted of obviouslydifferent but phonetically similar texts, that is,the real words. If the submelodic hypothesis werecorrect, the integration effect should be obtained.That is, a melody should be better recognizedwhen it appears with words that are phoneticallysimilar to the nonsense words with which it wasoriginally heard than with words whose phoneticderivatives are equally familiar but had beenheard originally with a different melody.

Subjects heard a presentation of 24 songs withnonsense texts followed by a 20-item melodyrecognition test containing four each of thefollowing types of items:

(a) "old songs" (old melody with real words thatare phonetically similar to the nonsense textheard with that melody in thepresentation).2

(b) "mismatch songs" (old melody with realwords that are phonetically similar to anonsense text heard with a different melodyin the presentation);

(c) new melodeold words" (new melody withreal words that are phonetically similar to anonsense text heard in the presentation);

(d) old melody hummed, (that is, withoutwords);

(e) new melody hummed.The main question was whether melody

recognition would be better in the "old song" thanin the "mismatch" condition. The hummed testitems provided a baseline for melodyrecognition.

MethodMaterials

Using the songs described under GeneralMethod, each text was translated intophonetically similar nonsense us:. g the rulesdescribed in Experiment 1 of Serafine et al.(1986). The following are exami,les of translatedtexts writtel. following English orthography:

Page 161: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

152 Crowder et al.

Original:Nonsense:Original:Nonsense:

DesignFive sets of presentation and test tapes were

constructed, each administered to a differentgroup of subjects. Presentations consisted of 24song excerpts with nonsense texts, and testsequences consisted of 20 items containing realwords, where each of the five types of itemsoccurred 4 times. Across five subject groups, eachpresentation item was tested in each of the fivecondition' For example, the first presentationitem was tested as an "old song" in one group, as a"mismatch song" in another group, as an oldmelody hummed in another group, and so on.Because each "mismatch song" required hearingtwo songs in the presentation, the presentationtapes consisted of 20 song excerpts plus 4additional ones for the "mismatch" condition.

ProcedureTesting was conducted individually in a quiet

laboratory with tapes heard over loudspeakers.Subjects were instructed to listen to a presen-tation of. 24 excerpts that would sound likefolksongs except that the texts had been changedto nonsense. They were told that their "memorywould be tested later" but not informed that onlymelody recognition would be tested. The testimmediately followed the presentation. Subjectswere told that test items would consist of hummedmelodies or songs with real words, but in all casesthey were only to indicate whether they had"heard that exact melody before--that is, just themusical portion." Subjects indicated "yes" or "no"on the answer sheet and gave a confidence ratingthat ranged from 1 to 3.Subjects

Twenty Yale undergraduates with unde-termined levels of musical training were equallydivided among the five groups.

Results and DiscussionYes/no responses with confidence ratings were

translated into single scores with a theoreticalrange of 1 to 6, where 1 represents very confidentno (did not hear melody) and 6 represents veryconfident yes (did hear melody). The mean ratingsfor "old song," "mismatch song," new melody "oldwords," old hummed melody, and new hummedmelody, respectively, were 3.99, 4.44, 3.56, 4.00,and 2.92. An analysis o: variance with subjects asthe sampling variable showed conditions to be a

Cobbler, cobbler, make my shoe.Togglue, Togglue, nate nie choo.Cape Cod girls they have no combs.Tade top berf sheyjaze mo tong.

significant source of variance, F(4,76)=9.45, p <.001. The Newman-Keuls procedure was used toidentify which comparisons produced thesignificant overall effects. Evidence that melodyrecognition was above chance was provided by thefact that the mean rating for old melodieshummed (4.00) exceeds that for new hummedmelodies (2.92), p < .01, as well as by the fact therating of "mismatch songs" (4.44) was reliablygreater than the condition with new melodies and"old words" (3.56). However, "mismatch songs"generated a higher mean rating (4.44) than did"old songs" (3.99), contrary to our expectationsbased on the submelodic interpretation of theintegration effect. Thus, phonetically similar "oldsong" texts did not enhance recognition for theoriginal melodies.

In this experiment, Then, the submelodichypothesis was not supported. No evidenceemerged that the nonsense words in thepresentation imposed effects on the melodieswhich would allow the phonetically similar realwords to enhance melody recognition. As a test ofthe submelodic hypothesis, however, thisexperiment seemed in retrospect to have beencompromised by the fact that subjects heard realwords on the test after having heard nonsense inthe presentation. Possibly the surprise of a fullsemantic experience after originally studyingnonsense may have been distracting. InExperiments 2 and 3 we addressed this problem.

Experiment 2One aim of the present experiment was to verify

that the integration effect could be obtained withnewly-constructed nonsense texts that would benecessary for Experiment 3, where the submelodichypothesis was tested again. In an earlier study(Experiment 1 of Serafine et al., 1986) we hadshown, as noted above, that the integration effectwas robust with nonsense texts of the sort used inExperiment 1 of the present paper. In the presentand following experiments, however, somewhatdifferent rules were employed for the constructionof nonsense texts, and we sought to verify that theresulting new materials would give rise to theintegration effect. Following a presentation ofsong excerpts with nonsense words, a melodyrecognition test employed only three types of testitems:

(a) old songs (old melody, old nonsense words)exactly as heard in the presentation);

(b) mismatch songs (old melody with oldnonsense words that had been sung to adifferent melody in the presentation);

(c) new melody/old words.

160

Page 162: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Physical lntenaction Association by Contiguity in Mrmosy for the Words and Melodies of Songs 153

The critical comparison was that between theold song and mismatch conditions. The mainprediction was that melody recognition would beenhanced by the presence of the original oldnonwords (in the old songs) over that obtainedwith the different but equally familiar nonwords(in the mismatch songs).

MethodMaterials

The design of Experiment 2 required onlyeighteen of the 20 song pairs we had available.(Two pairs were omitted on the grounds that theyhad proven at least somewhat familiar to somesubjects in Experiment 1). Because, in Experiment3 reported below, two nonsense texts wererequired for each real text, it was necessary toemploy new rules for translating real words intononsense. The new rules, -s follows, were similarto, but more detailed than those of Experiment 1and allowed fewer deviations. For example, thevoicod/unvoiced distinction was was preservedacross transformations:

(1) Vowels remain the same, and the followingvowel-liquid sequences are treated as intact : /er/as in Mary, /an/ as in far, /IV as in will, /or/ as inlore, /Di/ as in boy, /au/ as in how, /ea/ as in pail,/al/ as in doll, lollas in awl, /Anz/ as in runs.

(2) Consonants are interchanged according tothe following list of phonetic similarities. Forexample, if /b/ occurs in a real word, the twocorresponding nonsense words use /d/ and 4,respectively:

/b/=/d/2--/g//n/=/m/=/1//r/=/1/=/j/ or /w/; /w/=/j/=/r/ or /1/I pl=lt1=11c1

10=101--.1111 or /s/; /A/ =/ff=/1/ or Id; /h/ =/3/ fi; /s/=111=Iff

itr/n/kw/=/p1//pr/=/tw/=/k1//kr/=/twhz/p1//st,/=/sk/=/sp//s1/=/jw/=/fr//skw/=/str/=/spl//b1/=/dw/=/gr//sp/rdst/=/sk/terminal /n/=terminal /m / =terminal /13/

Special cases of translated vowel/consonantcombination:

/811/=/e-m/=/e-n/ (e.g., girl = berm, derv)/irk.-/i1/=/in/ (e.g., here = seal=feen)

/end/=/emd/ =/emb//An/=/Arn/=/A Vterminal /on/=/on/=/om/ (e.g., song=

fawn, shawm)terminal /nY = /nn/ = Intl

(3) Interior /e-/ or /e-n/ is treated as a vowel, butin terminal position is interchanged as follows:/e4=/an/=/oll as in anger=often=abk.

(4) A terminal /s/ or /z/, when a plural marker,may be retained (untranslated) if the resultingnonsense is too difficult to pronounce.

(5) The sounds AI/ and /ds / are omitted from allreal texts because three suitable phoneticcorrespondences do not exist. Thus, minor changesin some real texts were made te.g., Joe changed toMoe, chase to run).

The following is an example of a translated text,written in the form (regular orthography) used bythe singer:

Original: Cobbler, cobbler make my shoeNonsense 1: Poggrel, poggrel nate nie fop.Nonsense 2: Toddwen, toddwen lape Ee thoo.Only one <tet of nonsense texts was used in this

experiment

DesignThe design was comparable to that of

Experiment 1. Three sets of presentation and testtapes were administered to different sets ofsubjects. Across the three subject groups, eachpresentation item was tested in each of the threeconditions: old song, mismatch, and newmelody/old words. The presentation tapesconsisted of 24 songs, and test sequences consistedof 18 items, six each of the three conditions.

ProcedureThe testing procedure was comparable to that of

Experiment 1. Subjects were instructed to listento a presentation of folksong excerpts withnonsense texts, were told that their "memorywould be tested later," and following thepreslntation were given the melody recognitiontest in which they were to indicate whether theyhad "heard this exact melody beforethat is, justthe musical portion." They were not told whattypes of items to expect on the test except that thenonsense folksongs would be similar to those onthe presentation.

SubjectsFifteen Yale undergraduates with undetermined

levels of musical training were equally dividedamong the three groups.

161

Page 163: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

154 Crowder et al.

Results and DiscussionAs in Experiment 1, responses were translated

into 6-point ratings where 1 represents veryconfident no (did not hear melody) and 6represents very confident yes (did hear melody).Mean ratings for the old song, mismatch, and newmelody/old words conditions were 4.85, 3.63, and2.59 respectively. The results of two omnibusanalyses of variance were significant: Withsubjects as the sampling variable, F(2,28)=42.66,p < .001, and with the 18 test items as thesampling variable, F(2,34)=41.76, p < .001.Newman-Keuls tests revealed that melodyrecognition was significantly better in the old songthan in the mismatch condition, both acrosssubjects (p < .01) and across items (p < .01).

Thus the integration effect was confirmed withthese new materials, verifying that the presence oforiginal old wordseven nonsense words absentof semantic meaningfacilitates melodyrecog ition over that obtained with the differentbut equally familiar words, in the mismatchstings. Besides vindicating our new stimuli, theresults of Experiment 2 provide welcomereplication for one of our most important previousresults: In this 'ley: experiment, mean ratings forthe old song and mismatch. conditions,respectively, were 4.85 and 3.63; correspondingmeans from Experiment 1 of Serafine et al (1986)were 4.47 and 3.76.

Experiment 3The aim of Experiment 3 was to retest the

submelodic hypothesis, which had not beenconfirmed in Experiment 1. To avoid thepotentially distracting use of both nonsense (atpresentation) and real words (at test), which mayhave influenced the outcome of Experiment 1, weemployed two different sets of phonetically derivednonsense texts, based on the same real words(which were never used in this experiment). Thepresentation consisted of folksong excerpts withnonsense texts. The test consisted of folksongexcerpts whose texts were phonetically similar tothose in the presentation but nevertheless were,in all cases, different nonsense. (As in Experiment1, the phonetic derivative of an old song was calledan "old song," etc.) Test items were of three types:

(a) "old songs" (old melody with nonsense wordsphonetically similar to the old nonsensetext);

(b) "mismatch songs" (old melody with nonsensewords phonetically similar to an oldnonsense text from a different song in thepresentation);

(c) new melody/"old words" (new melody withnonsense words phonetically similar to anold nonsense text).

If the submelodic hypothesis were correct, thatis, if words impose subtle and memorable effectsupon their melodies, then a melody should bebetter recognized when it is heard with nonsensewords that are phonetically related to thenonsense with which that melody was originallypresented than when heard with nonsense that isnot phonetically related to the original. In otherwords, melody recognition in "old songs" shouldexceed that in "mismatch songs."

MethodMaterials

The materials were those described underExperiment 2. Both sets of nonsense texts,(phonetic derivatives of the original folksongtexts) were employed.

DesignThe design was comparable to that of

Experiment 2, except that the test items, insteadof comprising old songs, mismatch songs, and newmelodies with old words, used the phoneticallyderived "old songs," "mismatch songs," and newmelodies with "old words," where our quotationmarks indicate that exact repetition of the verbaltexts between learning and test never occurred. Asin earlier experiments, counterbalancing acrosssubjects groups was employed to control fornatural variations in the songs. The presentationconsisted of 24 items and the tests consisted of 18items.

After 12 of the 30 subjects had been tested, aninadvertent error in the test. tapes was detected.Two song pairs contained faulty material for thecondition new melody/"old words,* although theother two conditions were correct. Thus, scores forthose 12 subjects were based on four (instead ofsix) items in the new melody "old words"condition.

ProcedureThe procedure was analogous to that of

Experiments 1 and 2. At test, subjects were toldthat the texts of songs may sound similar to ordifferent from those heard before, but they were toattend only to the melody and indicate recognition(yes or no) and a confidence rating on the answersheet.Subjects

Thirty adults with undetermined levels ofmusical training were paid to participate andwere equally divided among the three groups.

1G2

Page 164: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

i veal Interaction and Association by Contiguity in Memory for the Words and Melodies of Songs 155

Results 'nd DiscussionAs before, melody recognition ratings had a

possible range of from 1 to 6. Means for the "oldsong," "mismatch," and new melody with "oldwords" conditions were 4.26, 3.88, and 3.05,respectively. With subjects as the samplingvariable, the result of an analysis of variance wassignificant, F(2,58)=28.18, p < .001, and Newman-Keuls tests indicated that melody recognitionunder the "old song" condition was significantlybetter than that under the "mismatch" condition,p < .05.

With items as the sampling variable, ananalysis of variance was performed on meansgenerated only by the 18 subjects who hadcompleted all items in all conditions. Thosemeans, for the "old song," "mismatch," and newmelody "old words" conditions respectively, were4.10, 3.71, and 3.04. The main effect warsignificant, F(2,34)=12.02, p < .001, and a prioricomparisons involving only the first twoconditions revealed significance at the .02 level.(The results of post hoc tests were not significant,however.)

The results of the present experiment show thatthe integration effect is obtainable withphonetically similar nonsense used at lest. Oneplausible explanation for this result is that wordsexert specific, albeit subtle, and memorable effectson the melodies with which they are sung. Thesesubmelodic effects include the manner in whichconsonants (perhaps also vowels) affect the onset,duratien, and offset of particular melody tones. Inour view, it seems indisputable that words exertvariable effects on melody tones, as can be easilyimagined, for example, in the case where twotones accompany the words "tip-top" as opposed to"ho-hum."

We think it not an accident that the presentexperiment showed evidence favorable to thesubmelodic hypothesis, whereas earlier effortswith a similar experimental design did not. Therules for deriving phonetically-similar nonsensetexts were more fastidious here than those usedbefore: For example, in these new materials werespected the voiced/voiceless distinction moreconsistently than under the old rules. Consonantswith stop closures were distributed equally in theoriginal and derived versions, too. Thesedistinctions are just the sort that would beexpected to underlie a submelodic effect of wordson music.

Other interpretations of the integration effect,for example those to be considered below, mightalso be consistent with the evidence adducedhere

for the submelodic hypothesis. ComparingExperiments 2 and 3 of the present series, we notea smaller, and statistically weaker integrationeffect in the latter, with the derived nonsensewords, than in the former, with the very samenonsense texts presented at learning and test.This is as it :should be, by any commonsense view,for no scheme for deriving "similar" phonetic textscould possibly be as faithful a reinstatement ascomplete identity. On the other hand, we shouldnot exaggerate the triumph of the submelodichypothesis: At most, we can claim that we haveshown conclusively that some such factor isoperating somehow in our integrationexperiments, not that it is an answer as to thecomplete cause of the effect.

Introduction to Experiments 4 through 6The remaining three experiments investigated

the degree to which the melodies and texts mightbe associated in memory because of their closetemporal proximity, as successive events inExperiments 4 and 5, and as simultaneous eventsin Experiment 6. These experiment address whatwe referred to above as the association-by-contiguity hypothesis. The term association, byitself, may connote many things theoretically,such as rote learning, Pavlovian cbaditioning, andpre-cognitive, antediluvian mists of antiquity.However, its denotation is theoretically empty: Itsimply stands for an experimental fact, thatevents A and B stand in a particular empiricalrelationship because of their history of co-occurrence. The challenge for theory is torationalize the circumstances necessary for thatassociation to be formed and the nature of thebonding th, .reby achieved. Thus, our integrationresult illu3trates some form of association,without doubt. The submelodic mechanism, forwhich we adduced some support in Experiments 1to 3, is not strictly an associative mechanism atall, but rather an effect of one element upon thenature of the other, namely that the occurrence ofA with B changed the physical nature of B. Wenow ask whether the temporal contiguity of A andB, words and melody respectively, is a sufficientcondition for their association when no possibilityexists for an overt influence of one upon thephysical integrity of the other, as with thesubmelodic mechanism.

In considering the theory of associations wehave relied upon the distinction in the respectivepsychologies of James Mill and of John Stuart Millbetween mental compounding and mentalchemistry (see Boring, 1957, chapter 12). In the

163

Page 165: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

156 Crowder et al.

former case two components retain theirindependent identities, yet are connected to oneanother. In the latter case the two components arethemselves altered by each other's presence. Ourconcept of the association-by-contiguity of melodyand text is like that of mental compon.nding: themelody and text are connected in memory, henceact as recall cues for each other, yet each is storedwith its independent integrity intact. By contrast,the submelodic hypothesis in the first threeexperiments is consistent with a somewhat morechemical form of bonding, for a melody and textchange each other physically when sung togetherin a song. A more purely mental chemistry couldbe an associative process in which, by co-occurre ce in the mind, the memoryrepresentation of each is changed as against whatit would have been without a particularcompanion.

More recently, a similar distinction has beenarticulated by Horowitz and Mane lis (1972), albeitwith a linguisti: orientation, for adjective-nounphrases. They refer to a distinction betw6.3n I-Bonding (where I stands for individual orindependent) and J-Bonding (where J stands forjoint). The former, illustrated by the phrases deep-chair or dark-wing, rake their meaning as aphrase from the meanings of the constituentwords. The latter, illustrated by high-chair orright-wing, posseslies idiomatic meaning thattranscends the meanings of the severalconstituents. As Horowitz and Mane lis remarked(p. 222), I-Bonding owes allegiance to the Britishempiricist philosophers and J-Bonding to theGestalt tradition. Tulving's work on recognitionfailure in episodic memory (Tulving, 1983)illustrates the same properties as J-Bonding,wherein an element of an association can be onlypoorly recognized but can be well recalled giventhe original associate as a cue. In many ways webelieve that these issues are raised in their moststark relief when the two constituents, such aswords and melodies, are fundamentally differentcognitive elements than when intraverbalassociations are at stake.

Experiment 4The present experiment investigated the

concept of contiguity as a sufficient condition forassociation. It assessed the degree to which a textcould serve as the retrieval cue for a melody, whenthe two had initially been heard in close temporalproximity (in this case successyvely), yet not as aproper song. Each component in Experiment 4was strictly independent physically: Texts were

spoken and melodies were hummed. Using atechnique similar to those reportel above, we gavesubjects a serial presentation of spoken texts, eachfonmed by P-s corresponding melody, hummed,and then each text-melody pair was followed by a10-sec inte: . al of silence during which subjectswere to 'imagine" the song. A melody recognitiontest followed in which true (sung) songs andhummed melodies were heard. If subjects hadmanaged to imagine the songs, as instructedduring original presentation, they should havebehe ..ed in the same way as subjects in our earlierexperiments, who had actually heard the songs.The test items were of five types (quotation marksindicate a deviation from what was heard in thepresentation):

(a) "olt songs" (the text and melody of one pairfroi.._ the presentation were sung together asa song in testing);

(b) "mismatch saws" (the text from one pairand the melody from a different pair weresung together as a st:tng);

(c) "new melody/old words" (the text from onepair heard in the presentation was sungwith a new melody);

(d) old melody hummed (exactly as heard in thepresentation except not preceded by words);

(e) new melody hummed.

The main question was whether the "old song"condition would generate better melodyrecognition than would the "mismatch" condition.Such an advantage could derive from either of twoprocesses, corresponding to I- or J-Bonding in theterminology of Horowitz and Manelis (1972). If thesimple contiguity hypothesis were correct, wewould expect that melodies and texts presented inclose succession would be connected in memory,hence could act as recall cues for one another.Thus, in a melody recognition test consisting oftrue (sung) songs, the melody should be betterrecognized when heard with the text with which itis connected than with a different (mismatched)but equally familiar text. Likewise, as we saidabove, if subjects are able to fuse the melodies andtext mentally, using the 10 sec "imagine" period asthey were instructed, to create a song-like memoryrepresentation, then, too, the old song conditionwould produce better melody recognition than themismatch condition, as in the earlier experiments.Thus, the conditions of this experiment could notpermit a choice between these two hypotheses.That would require consultation of still furtherexperimental arrangements. However, a positive

164

Page 166: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Physical Interaction and Association by Contiguity in Memory for the Words and Melodies of Songs 157

outcome here would be a necessary, if insufficient,condition for either hypothesis.

MethodMaterials

The 20 pairs of folksong excerpts were used withtheir real (not nonsense) words. Tape recordings ofspoken texts and hummed melodies were made bythe same female alto that was the singer in ourearlier studies. Each spoken text generallyfollowed the rhythm of its companion melody,consumed approximately the same amount oftime, and had the character of poetic speechrather than normal, conversational speech.Design

The design was comparable to that ofExperiment 1, with five sets of tapes used tocounterbalance the test conditions for eachpresentation item across five subject groups. Thepresentations consisted of 24 text-melody pairs,where the spoken text and hummed melody ineach pair were separated by a one-sec interval andfollowed 10 sec of silence. The test consisted of20 items, four each assigned to the five conditions.Procedure

The procedure was generally the same as that ofthe earlier experiments, except that subjects weretold they would hear spoken texts and hummedmelodies from simple folksongs and that theywere to use the 10-sec interval to "imagine thewords and melody together as though someonewere singing them" and to 'sing the song in yourhead." On the test, subjects were told to expecteither true (sung) songs or hummed melodies andto indicate melody recognition as in the otherexperiments.Subjects

Twenty five adults with undetermined levels ofmusical training were divided equally among thefive groups.

Results and Dis:::ssionAs in earlier experiments, melody recognition

ratings had a theoretical range of I to 6. Meanratings respectively for the "old song," "mismatchsong," "old words/new melody," old melodyhummed, and new melody hummed were 4.24,4.08, 3.63, 4.28, and 3.23. The result of an analysisof variance wit:A subjects as the sampling variablegave a statistically significant main effect ofcondition, F(4, 96) = 5.38, p < .001. Newman-Keulstests showed that melody recognition, on its own,was better than chance; there was a significant

difference in the baseline comparison between themeans for old and new hummed melodies (4.28and 3.23 respectively), p <.05. This conclrzion isbolstered by the fact that the telodies of "oldsongs" yielded a higher mean toting (4.24) thandid "new melodies with old words" (3.63), p < .05.However, no integration effect for texts tndmelodies occurred. That is, the mean rating for"old songs" (4.24) was not significantly higherthan that for 'mismatch songs" (4.08). There was,then, no advantage for melody recognitionconferred by the presence of original words overdifferent but equally familiar words.

Thus, we can report no evidence that a melodyand text heard in succession are better recognizedlater in each other's pr. Jence, even wheninstructions had been given to imagine the :,wopresentation components as a song. Severaleasons could possibly underlie the failure to find

an association effect here. The problem may havebeen in the generation process, for one thing:.',ubjects may have been simply unable to imaginethe combined components as a unified song. Or,the imagination instructions themselves maysomehow have served to distract the subjects fromencoding the two components even as "normal"paired associates. Further, in this e periment anunprecedented inconsistency existed betweenwhat was heard in input (successive spokenspeech and hummed melodies) and what wastested for recognition in output (sung songs); thismay have been distracting. And of ..ourse it ispossible that spoken and hummed stimuli of thesort employed here are not conducive to either aprocess of song generation or of contiguousassociative bonding.

A special circumstance introduced by spokenspeech and hummed melodies is the introductionof elements in each stream that are foreign to theidentity of these as components within normalsongs: The prosody of normal speech necessarilyintroduces intonation gestures (phrasaldeclination for example) that would not becompatible with the sung version. The act ofhumming, likew 'se, cannot but introduce nasaland vowel segments that might otherwise notreside in the spoken text. Therefore, ourabstraction of the spoken and melodic streams inthe methodology of this set of studies is not anabsolutely neutral operation. As so often, negativeresults are ambiguous, but positive results (seebelow) speak with considerably greater force.

In our next experiment we focused t Ciepossibility that the instruction to &.,ner a songs,at presentation, had backfired even to the extent

1C

Page 167: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

158 Crowder et al.

of preventing the formation of independent (I-bonded) associations.

Experiment 5Expenment 5 differed from Experiment 4 in two

ways: First, no instructions for imagining songswere given at presentation. Second, test itemscontained the same format of successivecomponents as had the presentation, that is,spoken words followed by hummed melodies.Abandoning the imagery instruction was intendedto determine whether a melody and text couldbecome ndependently associated in memory if thetwo components had been in close temporalproximity. Having established baseline melodyrecognition in Experiment 4, we saw no need torepeat the two hummed-melody conditions. Themelody recognition test consisted of successivetext/mehdy pairs under the following threeconditions:

(a) old songs (f spoken text followed by itsnummed melody, exactly as heard in thepresentation); 3

(b) mismatch songs (a spoken text followed bythe hummed melody of a differenttext/melody pair from presentation);

(c) old text with a new melody that had notbeen heard in the presentation.

MethodMaterials

Eighteen of the 20 song pairs used inExperiment 4 were used in the presentexperiment.

DesignThe design was comparable to that of previous

Experiments 2 and 3. Three sets of tapescounterbalanced the test conditions for eachpresentation, item across three subject groups. Thepresentation consisted of 24 text/melody pairs,and the test consisted of 18 text/melody pairs, 6each ,if the three conditions.

ProcedureThe procedure was comparable to the earlier

experimenta. Subjects were told to expect pairs ofspoken texts and hummed melodies on Ciepresentation and test. Melody recognition ratingswere obtained as before.

SubjectsFifteen adults with undetermined levels of

musical training were equally divided among thethree groups.

Results and DiscussionAs before, melody recognition ratings had a

theoretical range of 1 to 6. Means for the old song,mismatch, and old words/new melody conditionswere 4.22, 4.12, and 2.98 respectively. Clearly, nosignificant difference emerged between the oldsong and mismatch conditions. In other words,hummed melodies were not better recognizedwhen preceded by the same spoken text which hadpreceded that melody at presentation than whenpreceded by a different (yet equally familiar) text.Thus, no evidence was fcund that a link inmemory is engendered by the successivepresentation of independent te4s and melodies.This leaves open the possibility that mentalcompounding is not the agency for the integrationeffect between melody and text nported in ourearlier experiments. If not, then the submelodichypothesis remains the only explanation for theeffect with evidence in its favor. However, alingering question is whether a simultaneouspresentation of spoken text and hummed melodycould give rise to an association in memory of thetwo components. This was addressed in thefollowing experiment.

Experiment 6The objective of this experiment was to assess

the degree to which a simultaneous presentationof spoken words and hummed melody could giverise to an integration of the two such that themelody was recognized better in the presence ofthe text with which it had originally beenpresented than in the presence of a different(equally familiar) text. This result would indiPatethat our reasoning about independent associativebonding had been correct, in Experiments 4 and 5,but our realization of contiguity had beeninadequate.

The presentation episodes consisted of normalspoken texts and hummed melodies heardsimultaneously and binaurally (but notdichotically). We refer to these simultaneouspairings as "spoken songs." The later recognitiontests were of two types: Half the subjects heardonly spoken songs (as in presentation) and halfheard true, sung songs. Again, no instruction forthe generation of song-like representations wasgiven. So the question at hand was whether anassociation between contiguous components, if itoccurred, would influence melody recognition onlyif the test stimuli were like those of thepresentation or whether that association'sinfluence would extend also to the case of truesongs.

1 C 6

Page 168: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Physical Intersection and Association by Contiguity in Memory for the Words and Melodies of Songs 159

The melody recognition tests for both the(between-subject) conditions with spoken songsand true songs consisted of three within-subjectconditions. As before, the critical comparison wasthat between old songs and mismatch songs.

(a) old songs (same text and melody as washeard in the presentation);

(b) mismatch songs (the text of one pair and themelody of a different pair heard in thepresentation);

(c) old words with new melody.

MethodMaterials

Eighteen of the 20 song pairs were used.A master tape, from which experimental tapes

s ere dubbed, was prepared by the same altosinger, as follows: Hummed melodies were firstrecorded in succession, each preceded by exactlyfour evenly spaced tans, also recorded onto thetape. The resulting signal was then fed into asecond tape recorder at the same time that spokentexts were recorded onto a second tape. 'hie singerlistened to the hummed melodies from the firsttape over headphones, using the four taps to fixthe onset of the hummed melody, and then spokethe text along with the melody, recording bothonto the second tape. Texts were generally spokenin the rhythm of the melody and also began andterminated in synchrony with it. Whenexperimental tapes were dubbed from the master,the four taps were omitted. The test tapesemploying true songs were the same as thoseemployed previously with these materials.

DesignThe design was exactly analogous to that of

Experiment 2, except that two sets of test tapswere constructed, each administered to a differentgroup of subjeme, one set with spoken songs andthe other with true songs.

ProcedureThe procedure was comparable to that of the

earlier experiments, with subjects told to expectthe spoken texts of simple folksongs to be heardsimultaneously with hummed melodies. At test,one group was told that items would be true, sungsongs, and the other group tha test items wouldbe similar to presentation items. In all cases, ofcourse, instructions called for recognition basedonly on the melodies.

SubjectsTwenty-four adults with undeter mined levels of

musical training were equally divided between the .

two test groups.

Results and DiscussionMelody recognition ratings had a theoretical

range of from 1 to 6. Mean ratings for old songs,mismatch songs, and old words/new melodyrespectively were 4.56, 4.04, and 3.33 when thetest consisted of spoken songs (as heard in thepresentation) and 4.35, 3.96, and 3.25 when thetest consisted of true songs. Two mixed analyses ofvariance were performed with type of test (spokenvs. true songs) as a between-subjects variable andthe same three conditions ("old song," "mismatch,"and new melody/"old words") as a within-subjectsvariable. With subjects as the sampling variable,only the main effect of conditions was significant,F(2,44)=21.68, p <.001; neither the type of testmain effect nor the interaction was significant.The Newman -Keuls test indicated that combined"old song" ratings for both groups (4.46) exceededthat for "mismatch songs" (4.00), p < .05.Similarly, with items as the sampling variable,only differences among the three conditions weresignificant, F(2,68)=13.65, p < .001. The Newman -Keuls again test supported the difference betweei."old song" and "mismatch" ratings, p c .05.

Thus, by the reasoning of the fourth and fifthexperiments here, true temporal contiguity wasthe necessary condition for observing ourintegration effect. The most straightforwardinterpretation of that result is that, in Experiment6, conditions were favorable for the formation ofindependent associative links betweenconstituents that had not lost their individualidentity. Close temporal proximity, as inExperiments 4 and 5, was apparently not enough.

Hare, for the first time in this series, we mayrule out the submelodic hypothesis, because thepairing manipulation cannot have had anysubstantial effect on the physical nature of eachconstituent.4 Likewise, the cognitive version of thesubmelodic hypothesisJ-Bonding indicative ofwhat we have called "mental chemistry"receivedno encouragement from these last threeexperiments. What makes difference is notwhether or not people try activsly to integrate themelody and words in their minds, constructing anunheard song, but whether or not the two werestrictly simultaneous.

167

Page 169: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

160 Crowder et al.

Straightforward as this interpretation is, we arenot so naive as to believe that one absentinteraction from a single experiment canoverthrow ideas as important as J-Bonding orEncoding Specificity. Besider the usual cautionthat we need more converging evidences on thispoint, it could be argued, albeit with soconsiderable added assumptivis, that all subjects,in both groups of this experiment, left thepresentation sequence with self-generated songsas memory representations. Hearing a hummedmelody at the same time as one hears arhythmically-matching stream of words mightproduce the experience of a song, whether thesubject is trying to generate this or not. Thiswould account for the integration effect amongsubjects tested with real songs. To account for thesame effect in subjects tested with spoken songs,we need only observe that for these people, theconditions of acquisition and testing were exactlythe same, which could have outweighed thedisadvantage produced by the need for thesesubjects to generate song representat. 3 at test,as well as at acquisition.

An automat' , process generating song-likerepresentations from simultaneous, compatibleverbal and musical streams would not beunexpected from a consideration of speechprocessing: In our ordinary lives, a simultaneousmixture of this sort is the rule rather than theexception, because the segmental features (inwords) are always overlaid upon supra-segmentalfeatures, including specifically variation infundamental frequency. For this ecological reason,simultaneous variation in pitch might be assignedto the prosodic aspect of speech automatically,even when the listener "knows" the verbal andtonal messages are nearly independent, as inlistening to songs, or spoke: songs. Theseconsiderations lead us to the design of futureexperiments better exploiting the tonal prosodyand spoken message of integrated languagecommunication.

General DiscussionAs for the integration effect in song, our

experiments in this and in the two previouspapers (Serafine et al., 1984; Serafine et al., 1986)have guided our thinking in a number of ways.First of all, and despite musicological folklore tothe contrary, the meaning of wilds seems to havea negligible role in the fact that melody and wordsof folksongs become stored in an integral fashion.Here again, in Experiment 2, the result withstoodnonsense materials devoid of conventionalmeaning.

Secondly, we have adduced statistically reliablesupport for the submelodic hypothesis, suggestingthat particular words can change the musical linesufficiently to influence recognition of themelodies later. It is no wonder people are slow torealize that "Baa, Baa, Black Sheep" and"Twinkle, Twinkle, Little Star" are words to thesame tunethey are not, musically, quite thesame tune, by virtue of the words to which Lichhas been set.

Finally we have uncovered a number of factorsthat govern the size of the effect, somestatistically reliable on their own and others not.In retrospect, it was perhaps misguided for us tohave thought that a single factor would controlthe integration effect. Among the agencies forwhich evidence exists, we must include firsttemporal contiguity, as shown in Experiment 6here. Barring the unknown contribution ofautomatic fusion of text and melody in songs,hearing words and the melody the same timeappears to affect their joint storage in the mannerof paired associates. But we should not discardcompletely those factors uncovered by earlierexperiments in this series as potential factors;even though they were not reliable on their own,they did measurably affect the size of the effect.Among these, we count instructions to attend onlyto melodies rather than to the whole songs atpresentation (Serafine et al., 1984). Similarly, inthe same experiment, acoustic non-identity ofpresentation and test materials (different singers,respectively) had an effect in the direction thatwould have 13( an predicted (though notsignificantly). Elsewhere (Serafine et al., 1986 andhere, in Experiment 4) we found that melodies inthe presence of the wrong words did indeed have adistracting effect on melody recognition, beyondthe facilitation that the correct words had.

Putting all these factors together, we believe weknow well how to arrange conditions so as tomaximize, or minimize, the integration of wordsand melodies in recognition of songs. Thislaboratory control is not unsatisfactory asexplanation, provided one gives up the goal ofhaving only one crucial component.

REFERENCESBoring, E. G. (1957). A history of experimental psychology. New

York: Appleton-Century-Crofts.Erdel, P. (1974). American folbongs to read, sing, and play. New

York: Boosey and Hawkes.Horowitz. I.. M., & Manelis, L. (1972). Toward a theory of

redir tegrative memory: Adjective-noun phrases. In G. H.Bower (Ed.), The psycholozy of learning and totivatim, 6 (pp. 195-224). New York: Acadzmic Press.

Page 170: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Physical Interaction and Association by Conity in Memory for the Words and Melodies of Son 161

Serafine, M. L, Crowder, R. G., & Repp, B. H. (1984). Integrationof melody and text in mammy for song. Cognition, 16, 285-303.

Serafine, M. L., Davidson, J., Crowder, IL C., & Repp, B. H. (1986).On the nature of melody-text integration in memory for songs.Journal of Memoq and Language.

Tulvin& E. (1983). Elements of episodic memory. New York OxfordUniversity Preis.

FOOTNOTES'Memory & Cognition, in press, (in a shorter version).'Ptak University.1We are well aware that the dictionary meaning of the word=dimity stipulates that the events in question be juxtaposed,or adjacent in time, but not overlapping or coterminous. Thisdeparts from usage of the term within psychology, wheresuccessive and simultaneous arrangements are both considered

cont:guous. In this paper we remain with this latter usage eventhough the former might be more justifiable to some scholars.

2Throughout this paper, quotation marks on test item labelsindicate a deviation from the nomenclature desaibed underGeneral Method. Here, for example, an "'Ad song" is solabelled because it is the real-word phonetic equivalent of anold song and is not exactly what was heard in the presentation.

3The stimuli were, of course, in no sense true songs. However,we retain the same terminology as used in the otherexPelimenb-

4Certainly not in Experiment 4 and 5, where the two constituentsdid not overlap in time. In Experiment 6, with simultaneouscontiguity, masking-like effects could have existed between themelodies and texts. This perceptual interaction is not what wemean by phyr' -al interaction, which could not have occurred inany of these experiments.

169

Page 171: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Haskins Laboratories Status Report on Speech Research1989, SR-991100, 162-171

Orthography and Phonology: The Psychological Reality ofOrthographic Depth*

Ram Frostt

The representation of meaning by words is thebasis of the human linguistic ability. Spokenwords have an underlying phonologic structurethat is formed by combining a small set ofphonemes. The purpose of alphabeticorthographies is to represent and convey thesephonologic structures in a graphic form. Just aslanguages differ one from the other, orthographicsystems represent the various languages'phonologies in different ways. This diversity hasbeen a source of interest for both linguists andpsychologists. However, while linguistic inquiryaims to explain and describe the origins andcharacteristics of different orthographies,psychological investigation aims to examine thepossible effects of these characteristics on humanperformance. Consequently, reading research isoften concerned with the question of what isuniversal in the reading process across diverselanguages, and what aspects of reading are uniqueto each language's orthographic system. My firstobjective in this chapter is to outline theproperties of different alphabetic systems thatmight affect visual word processing. The secondobjective is to provide some empirical evidence tosupport the claim that reading processes aredetermined in part by the language's orthography.

Orthography, phonology and the mentallexicon

The purpose of orthographies is to designatespecific lexical candidates. There is, however,some disagreement as to how exactly this purpose

This work was supported in part by National Institute ofChild Health and Human Development Grant HD-01994 toHukins Laboratories. Many idea in the present chapter weregenerated in collaboration with Sh lomo Bentin. I am alsoindebted to Laurie Feldman, Len Katz, and Ignatius Mattinglyfor their criticism on earlier drafts of this paper.

162

is achieved. The major discussions revolve aroundthe role of phonology in the process of visual wordrecognition. Clearly, phonologic knowledge ofwords generally precedes orthographic knowledge;we are able to recognize many spoken words longbefore we are able to read them. Only later, in theprocess of learning to read, does the beginningreader master an orthographic system based, inwestern languages, on alpha;oetic principles.

The recognition of a printed word is based on amatch between a letter string and a lexicalrepresentation. This match allows the readeraccess to the mental lexicon. However, sincelexical access can theoretically be mediated by twotypes of abstract codes: orthographic andphonologic, a question remains about the exacttransform of the printed word that is used in theprocess of visual word recognition: Is itinformationally orthographic or phonologic?

One account argues that access to the mentallexicon is mainly phonologic (e.g., Liberman,Liberman, Mattingly, & Shankweiler, 1980).According to this view, orthographic informationis typically recoded into phonologic information ata very early stage of print processing. Thus, thelexical access code for printed word perception issimilar to that for spoken word perception. Theappeal of this model is its parsimony andefficiency of storage; the reader does not need tobuild a visually coded grapheme-based lexicon,one that matches each of the words to spellingpatterns in the language. Instead, a relativelysmall amount of informationknowledge ofgrapheme to phoneme correspondencescanrecode print into a form every reader alreadyknows: the speech-related phonologic form.

The second approach argues for the existence ofan crthographic lexicon in addition to thephonologic one. According to this alternative view,lexical access ior print can be achieved thr * ugh

170

Page 172: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Orthography and Phonology: The Psychological Reality of Orthographic Depth 163

either system. The extreme position of thisapproach holds :hat lexical access is typica' /based only on the visual (orthographi. )information, and the word's phonology is retrievedafter lexical access has occurred. Possibleexceptions are novel or low-frequency words thatmay lark an entry in the visually based lexicon(Seidenberg, Waters, & Barnes, 1984; Seidenberg,1Q85). The appeal of such models is that visuallexical access is direct and, presumably, fasterwithout the need for a mediating phonologicrecoding. However, a model based on visual lexicalrepresentations must assume the existence of amemory stc re of orthographically coded wordsthat parallels, in orthographic coding, most of theinformation the reader already possesses asphonologic knowledge.

Clearly, the reader is well aware of bothorthographic and phonologic structures of aprinted word. Hence, the debate concerningorthographic and phonologic coding is merely adebate about priority: is phonology necessary forprinted word recognition to occur, or is it just anepiphenomenon that results from it? In otherwords: is phonology derived pre-lexically from theprinted letters and serves as the reader's code forlexical search, or, rather, is lexical search basedon the word's orthographic structure whilephonology is derived post-lexically?

This question is often approached by monitoringand comparing subjects' responses in the lexicaldecision and the naming tasks. In lexical decisionthe sabject is required to decide whether a letterstring is a valid word or not, while in naming he isrequired to read the letter string aloud. In bothtasks reaction times and error rates are measuresof subjects' performance. Note that lexicaldecisions can be based on the recognitEa of eitherthe orthographic or the phonologic structure of theprinted word. In contrast, naming requiresexplicitly the retrieval of the printed word'sphonology. Phonology, however, can be generatedeither pre-lexically by converting the letterr intophonemes, or post-lexically by accessing themental lexicon through the word's completeorthographic structure, and retrieving from thelexicon the phonologic information.

Since, at least theoretically, these twoaltemaave processes Are available to the reader,one should compare their relative efficiency. It hasbeen suggested that .he ability to rapidly generatepre- lexical phonology depends primarily on thereader's fluenc'j, task characteristics, and theprinted stimuli's complexity (see McCusker,Hillinger, and Bias (1981), for a review). In ur

present context, only the factor of stimuluscomplexity is of a special interest. Complexity isgenerally related to the amount of effort neededfor decoding a given word. One possible source ofcomplexity that merits close examination is thelack of transparent correspondence betweenorthographic and phonologic subunits. Becausethe purpose of orthographic systems is therepresentation of phonology, whether the skilledreader uses this information or not, the relativedirectness and simplicitythe transparencyofthis representation can be of major importance.

Orthographic depthEvidence from theshallow Serbo-Croatian

Although the transparency between spelling andphonology varies within orthographies, it variesmore widely between orthographies. The source ofthis variance can be often attributed tomorphological factors. In some languages, (e.g., inEnglish), morphological variations are captured byphonologic variations. The orthography, however,was designed to preserve primarily morphologicinformation. Consequently, in many cases, similarspellings denote the same morpheme but differentphonologic forms: the same letter can representdifferent phonemes when it is in differentcontexts, and the same phoneme can berepresented by different letters. The words "heal"and "health", for example, are similarly spelledbecause they are morphologically related.However, since in this case, a morphologicderivation resulted in a phonologic variation, thecluster "ea" represents both the sounds [i] and [ ].

Within this context English is often compared toSerbo-Croatian. In Serbo-Croatian, (aside fromminor changes in stress patterns), phonologyalmost never varies with morphologic derivations.Consequently, the orthography was designed torepresents directly the surface phonology of thelanguage: Each letter denotes only one phoneme,and each phoneme is represented by only oneletter. Thus, alphabetic orthographies can beclassified according to the transparency of theirletter to phonology correspondence. This factor isusually referred to as "orthographic depth"(Klima, 1972; Liberman et al., 1980; Lukatela,Popadid, Ognjenovid & Turvey, 1980, Katz &Feldman. 1981). An orthography that representsits phonology in an unequivocal manner isconsidered shallow, while in a deep orthographythe relation of orthography to phonology is morepaque.Katz and Feldman (1981) suggested that the

kind of code that is used for lexical access depends

%.

171

Page 173: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

164 Frost

on the kind of alphabetic orthography facing thereader. Shallow orthographies can easily supporta reading process that uses the language's surfacephonology. On the other hand, in deeporthographies, the reader is encouraged to processprinted words by referring to their morphology viatheir visual-orthographic structure. Note thatorthographic depth does not necessarily h. tohave a clear psychological reality. For examkhas been argued that visual-orthographic accebfaster and more direct than phonologic access(e.g., Baron & Strawson, 1976). By tail; argument,it might be the case that in all orthographieswords can be accessed easily by recognizing theirorthographic structures visually. Therefore, therelation between spelling and phonology shouldnot necessarily affect subjects' performance.

Most of the earlier studies in word recognitionwere conducted with English materials. But inorder to validate the psychological reality oforthographic depth experimenters turned toshallower orthographies like Serbo-Croatian.

In addition to its direct spelling to phonologycorrespondence, the Serbo-Croatian orthographyhas an additional important feature: It uses eitherthe Cyrillic or the Roman letters, and the reader isequally familiar with both sets of characters. Mostcharacters are unique to one alphabet or theother, but there are some characters that occur inboth. Of these, some receive the same phonemicinterpretation regardless of alphabet. These arecalled COMMON letters. Others receive adifferent interpretation in each alphabet. Theseare known as AMBIGUOUS letters. Letters stringthat include unique letters can be read in only onealphabet. Similarly, letters string composedexclusively of common letters can be read in onlyone way. By c )ntrast, strings composed only ofAMBIGUOUS and COMMON letters are bivalent.They can be read in one way by treating thecharacters as Roman graphemes and in distinctlydifferent way by treating them as Cyrillicgraphemes. The two alphabets are presented inFigu. e 1. This specific feature of the Serbo-Croatian orthography was used in several studiesin order to examine phonological processing invisual word recognition (Lukatela et al. 1980;Feldman & Turvey, 1983)

Lukatela et al. (1980) investigated lexicaldecision performance in Serbo-Croatian, for wordrprinted in the Cyrillic and the Roman alphabets.They demonstrated that words that could be readin two different ways were accepted more slowlyas words than words that could be read in oneway. Thus, the fact that one orthographic formhad two phonologic interpretations slowed

subjects' reaction times. This outcome suggestedthat the subjects were sensitive to the phonologicstructure of the printed stimuli, while makinglexical decisions. Lukatela et al. concluded thatlexical decisions in Serbo-Croatian are necessarilybased on the extraction of phonology from print.Similar results were found by Feldman andTurvey (1983) that compared phonologicallyambiguous and phonologically unequivocal formsof the same lexical items. They have suggestedthat the direct correspondence of spelling tophonology in Serbo-Croatian results in anobligatory phonologic analysis of the printed wordthat determines lexical access. Moreover, incontrast to data obtained in English, the skilledreader of Serbo-Croatian demonstrates a biastowards a phonologically analytic strategy.

Serbo-Croatian AlphabetUppercase--

:ommonletters

Roman

a dI)0 F

GILNRSB \

JKM / U Z 2,

V

H PC s

B

UniquelyCyrillic lettus

Figure 1.

...,......Ambiguous

lettersUniquely

Roman littera

Evidence from the deeper Hebreworthography

The term "orthographic depth" has been usedwith a variety of related but different meanings.Frost, Katz, and Bentin (1987), suggested that itcan be regarded as a continuum on whichlanguages can be arrayed. They proposed that theHebrew orthography could be positioned at theextreme end of this continuum, since it representsthe phonology in an ambiguous manner.

Hebrew, like other Semitic languages, is basedon word families that are derived from tri-consonant roots. Therefore, many words share anidentical letter configuration. The orthographywas designed primarily to convey to the reader theword's morphologic origin. Hence, the letters inHebrew represent mairly consonants, while thevowels are conveyed by diacritical markspresented beneath the letters. The vowels marks,

1"`2

Page 174: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Orthography and Phonology: The Psychological Reality of Orthographic Depth 165

however, are omitted from regular readingmaterial, and can be found only in poetry, childrenliterature or religious scripts (for a detaileddescription of the Hebrew orthography see Navoi:& Shimron, 1984). When the vowels rue absent, asingle printed consonantal string us a11yrepresents several different spoken words(sometimes up to seven or eight words can berepresented by a single letter string). The Hebrewreader is, therefore, regularly exposed to bothphonologic and semantic ambiguity. Anillustration of the Hebrew ambiguous unvoweledprint is presentee in Figure 2.

Although it is clear that the Hebreworthography is an example of a very deeporthography, this is for different reasons thanthose presented in the context of the English vs.Serbo-Croatian distinction. English is labeled asdeep because of the opaque correspondencebetween single graphemes and phonemes in thelanguage's spelling system. In contrast, thiscorrespondence is fairly clear in Hebrew, since theconsonants presented in print, aside from a fewexceptions, correspond to only one phoneme.However, because the vowels are absent, theHebrew orthography conveys less phonologicinformation than many other orthographies.Hence, it is not just ambiguous, it is incomplete.This charecteristic of Hebrew, as I will argue, isnot only inguistic but also psychological, in that it

provides a possible explanation of differences inreading performance revealed in this language.

In order to assign a correct vowel configurationto the printed consonants to form a vali,1 word, thereader of Hebrew has to draw upoi As lexicalknowledge. The choice among the possible lexicalalternatives is usually based on contextualinformation: the semantic and syntactic contextsconstrain the possible vowel interpretations. Foran unvoweled word in isolation, however, thereader cannot rely on contextual information forthe process of disambiguation.

Several studies have examined readingprocesses of isolated Hebrew words. Bentin,Bargai, and Katz (1984) examined naming andlexical decision for unvoweled consonantal strings.Some of these atrings could be read as more thanone word while some could be read as one wordonly. The results demonstrated that naming ofphonologically ambiguous strings was slower thannaming of unambiguous ones. In contrast, noeffect of ambiguity was found in the lexicaldecision task. These results suggest that thereader is indeed sensitive to the phonologicstructure of the orthographic string when namingis required. Contrarily, lexical decisions are notbased on a detailed phonological analysis of theprinted word in Hebrew. Note, that this outcomeis in sharp contrast to the results obtained in theshallow Serbo-Croatian.

MIUnvoweled

form

:NO

1 =I

Voweled

forms

........ NV

...!..,Iry !....ow . =I :IN MI

ea ,INN a

1 , SI 1l r *goIMO a

I .11ISO M a1,

Phonemic

transcriptionssafer suer saner subar :afar spur star sapar

Meanings cook he told ..e:1! wastoIC

he

countedcount, border

postbarber

Figure 2. Unvoweled and v3V.vilti forms of the Hebrew tri-consonantal root 100 (sfr).

171

Page 175: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

166 Frost

Lexical decisions and naming of isolated Hebrewwords were further investigated in a study byBentin and Frost (1987). In this study subjectswere presented with phonemically andsemantically ambiguous consonantal strings. Eachof the ambiguous strings could have been readeither as a high-frequency word or as a low-frequency word, pending upon differ, nt vowelassignments. Decision latency for the unvoweled,Ionsonantal string was compared to the latenciesfor both the high and the low-frequency voweledwords. The results showed that lexical decisionsfor the unvoweled ambiguous strings N rere fasterthan lexical decisions for either of their voweled(therefore disambiguated) alternatives. Thisoutcome was interpreted as evidence that lexicaldecisions for Hebrew unvoweled words were givenprior to the process of phonologicaldisambiguation. The decisions were probablybased on the printed word's orthographicfamiliarity (cf. Balota & Chumbley, 1984;Chumbley & Balota, 1984). Thus, it is likely thatlexical decisions in Hebrew involve neither a pre-lexical phonologic code, nor a post-lexical one.They are based upon the abstract linguisticroresentation that is common to severalphonemic and semantic alternatives.

These results are in contrast to studies onlexical ambiguity conducted in English. Lexicaldisambiguation in English can be examined byemploying homographs. Such studies havesuggested that, at least initially, all meaningshigh- as well as low-frequency are automaticallyaccessed in parallel. (Onifer & Swinney, 1981;Tanenhaus, Leiman & Seidenberg, 1979; and seeSimpson, 1984, for a review). It should be noted,however, that in most cases the ambiguity inEnglish resides only in the semantic and syntacticlevels. With a few exceptions (e.g., "bow", "wind"),English homographs have only one phonologicrepresentation, and the reader, usually, does nothave to access two different words related to oneprinted form.

Although lexical decision in Hebrew might bebased on an abstract orthographic representation,there is no doubt that the process of wordidentification continues until one of severalphonological and semantic alternatives are finallyaccessed. This process of lexical disam... ;tuition ismore clearly revealed by using the naming task.Bentin and Frost (1987) investigated the processof selecting sp,:cific lexical candidates byexamining the naming latencies of unvoweled andvoweled words. In contrast to the result obtainedfor lexical decisions, naming of ambiguous strinkp

was found to be just as fast as naming the mostfrequent voweled alternative, with the voweledlow-frequency alternative slowest. In the absenceof constraining context, the selection of one lexicalcandidate for naming seems to be affected by afrequency factor: the high-frequency alterneive isselected first.

In a recent study (Frost & Bentin, inpreparation), the processing of ambiguousconsonantal strings in voweled and unvoweledHebrew print was investigated by using asemantic priming paradigm. Subjects werepresented with consonantal strings that could beread as a high- or a low-frequency word. Thesestrings served as primes to targets that wererelated to one of the two alternative meanings. Inorder to minimize conscious attentional processes,targets followed the primes at a short SOA(stimulus onset asynchrony) of 100 ms (see Neely,1976; 1977, for a discussion of this point). It wasassumed that if a specific meaning of theambiguous consonantal string was accessed, itwould be reflected by a semantic facilitation for itsrespective target. Thus, lexical decisions fortargets that are related to that specific meaningwould be facilitated.

In contrast to studies on priming at short.SOA'sin English (e.g., Seidenberg, Tanenhaus, Leiman,and Bienkowsky (1982), no semantic facilitationfor the low-frequency meanings was found in theunvoweled condition at 100 ms SOA. In thevoweled condition there was a significantsemantic facilitation for both the high. and thelow-frequency meanings. This result suggests thatin the voweled condition both the high-frequencyand the low-frequency meanings of theconsonantal strings were clearly depicted by thedisambiguating vowel marks.

Apparently, since the Hebrew reader almostnever reads voweled print, he uses theconsonantal information for accessing the lexicon.The phonologic representation of the high-frequency is selected first. Only at a second stagedoes the reader consider the low-frequencyalternative.

In conclusion, the deep unvoweled Hebreworthography represents primarily the morphologyof the Hebrew language, while phonemicinformation is conveyed only partially by print.Consequently, in addition to a phonologic lexiconthe Hebrew reader has probably developed alexical system which is based on phonologicallyand semantically abstract consonantal stringsthat are common to several words. Lexicalprocessing occurs, at a first phase, at this

-1F. ni tx

Page 176: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Orthography and Phonologyj7'he Psychological Reality of Orthographic Depth 267

morphological level. The reader accesses theabstract string and recognizos it as a validmorphologic structure. Lexical decisions areusually given at this early stage and do notnecessarily involve deeper phonologicalprocessing. The complete phonological structure ofthe printed word can only be retrieved post-lexically, after one word candidate has beenaccessed. The selection of a word candidate isusually constrained by context, but in its absenceit is based on frequency factors.

Evidence from cross-language studiesConducting experiments in different languages

contributes important insights concerning the roleof pre- or post-lexical phonology in deep andshallow orthographies. Nevertheless, conclusiveinferences cannot be drawn from these studiesunless they are supported by results obtained incross-language designs. Cross-language designsallow a direct comparison of native speakers'performances when the independent variablesunder investigation are controlled betweenlanguages, under identical experimentalconditions. Hence, they can provide directevidence concerning the effects of theorthography's characteristics on the process ofword recognition. Obviously, cross-languagedesigns are not without potential pitfalls;language differen6es may be confounded withnonlinguistic factors. For example, differences inthe subjects' samples due to motivation,education, etc., might interact with theexperimental manipulation. The interpretation ofthe results, thus, i.ingas on whether they arelikely to be free of such confounding.

Katz and Feldman (1983) compared semanticpriming effects in naming and lexical decision inEnglish and Serbo-Croatian. In this study,semantic facilitation was assumed to reflectlexical involvement in both tasks. The resultsdemonstrated semantic facilitation for both lexicaldecision and naming in English. In contrast,semantic priming facilitated only lexical-decisionin Serbo-Croatian. The authors suggested thatphonology, which is necessary for naming, isderived post-lexically in English: hence thesemantic facilitation in this task. In contrast, theextracti.r. of phonology from print in Serbo-Croatian does not call for lexical involvement butis derived pie-lexically. An additional finding inthe study was a high correlation of reaction timesfor lexical-decision and naming in Serbo-Croatianwithout semantic context. This result wasinterpreted as evidence for an articulatory code

used in this language for both lexical decisionsand naming.

The interpretation of differences in readingperformance between two languages, es reflectingsubjects' use of pre- vs. post-lexical phonology, canbe criticized on methodological grounds. Thecorrespondence between orthography andphonology is only one dimension on which twolanguages differ. English and Serbo-Croatian, forexample differ in their grammatical structures,and in the size and organization of their lexicon(Lukatela, Gligorjevid, Kostid & Turvey, 1980).These confounding factors, it can be argued, haveaffected subjects' performance in a similar way.

Frost, Katz, and Bentin (1987) endeavored toaddress this possible criticism by comparing threelanguages simultaneously. They examined lexicaldecision and naming performance Hebrew,English, and Serbo-Croatian. A. gh anycomparison between two of the languages mightbe confounded by other factors, the set ofconfounds is different for each of the threepossible pairs of comparisons. The only factor thatdisplays consistency with the dependent measureis orthographic depth. Assuming that it is indeedthe main factor that influences subjects'performance, predictions concerning a twolanguages comparison should be extended to thethird language. But, note that while theprobability of obtaining a predicted correctordering of performance in the two languages isone out of two, the probabilLy is one out of six,when three languages are compared. Thus, anappropriate ordering of subjects' performance inthree languages would corroborate more stronglythe psychological reality of orthoglphic depth.

In their first experiment FrosT et al. (1987)compared, in each language, reaction times forboth lexical decision and naming of high-frequency words, low- frequency words, andnonwords, in English, Serbo-Croatian, andHebrew. The results showed that the lexicalstatus of the stimulus (being a high- or a low-frequency word, or a nonword), affected naminglatencies in Hebrew more than in English, and inEnglish more than in Serbo-Croatian. Moreover,only in Hebrew were the effects co naming verysimilar to the effects on lexical decision: Just asthe lexical status of the stimulus affected lexicaldecisions, it also affected naming latencies. Thisoutcome confirmed that in deep orthographies likeHebrew, phonology is derived post-lexically. Incontrast, in a shallow orthography like Serbo-Croatian, naming performance is much lessaffected by lexical status. Given the direct

175

Page 177: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

168 Frost

correspondence of orthography to p' mology, theextraction of phonology from print does not call forlexical involvement.

In a second experiment, Frost et al. comparedsemantic priming effects in naming. Semanticpriming usually facilitates lexical access. Hence, ifthe word's phonology is derived post-lexically indeep orthographies but pre-lexically in shalloworthographies, then naming should be facilitatedmore in Hebrew than in English, and again, morein English than in Serbo-Croatian. Ashypothesized, the results revealed a relativelystrong effect of semantic facilitation in Hebrew(21 ms), a smaller but significant effect in English(16 ms), and no facilitation in Serbo-Croatianwhatsoever. These results were taken to stronglysupport the valid '.,y of the orthographic depthfactor in word recognition.

In a recent study, Frost and Katz (1989)investigated how the different relations betweenspelling and phonology in English and Serbo-Croatian are reflected in the ability of subjects tomatch printed and spoken stimuli. They presentedsubjects simultaneously with words or nonwordsin the visual and the auditory modality, and thesubject's task was to judge whether the stimuliwere the same or different. In order to carry outthe matching process, the subjects had to mentallyrecode the print to phonology, and compare it tothe phonologic information provided by thespeech. Performance was measured in threeexperimental conditions: (1) Clear print and clearspeech, (2) clear print and .degraded speech, and(3) clear speech and degraded print. Within eachlanguage, the effects of visual and auditorydegradation were measured relative to thebaseline undegraded presentation.

When the visual or the auditory inputs aredegraded, subjects are encouraged to restore thepart' 1 information in one modality by matching itto the clear information in the other modality.When subjects are presented with speech alone,restoration of degraded speech components hasbeen shown to be an automatic lexical process (seeSamuel, 1987). However, in addition to thisipsimodal restoration mechanism, subjects in theFrost and Katz experiment had the additionalpossibility of a compensatory exchange of speechand print information. Thus, the technique ofvisual and auditory simultaneous presentationand degradation provided insight concerning theinteraction of orthography and phonology in thedifferent languages.

The results showed that for Serbo-Croatian,visual degradation had a stable effect relative tothe baseline condition (about, 20 ms), regardless of

stimulus frequency. For the English subjects, theeffect of visual degradation was three to fourtimes stronger than for the Serbo-Croatians. Theinter-language differences that were found forvisual degradation were almost identicallyreplicated for auditory degradation: Thedegradation effects in English were again three tofour times greater than in Serbo-Croatian. Thus,the overall pattern of results demonstrated thatalthough the readers of English were efficient inmatching print to speech under normal conditions,their efficiency deteriorated substantially underdegraded conditions relative to readers of Serbo-Croatian.

These results were explained by an extension ofan interactive model (see McClelland &Rumelhart, 1981; Rumelhart & McClelland,1982), that rationalizes the relatis .ship betweenthe orthographic and phonologic systems in termsof lateral connections between the systems at allof their levels. The structure of these lateralconnections is determined by the relationshipbetween spelling and phonology in the language:simple isomorphic connections betweengraphemes and phonemes in Serbo-Croatian, butmore complex, many-to-one, connections inEnglish. The concept of orthographic depth hasdirect bearing on the question of the relationbetween the phonologic and orthographic systems.Within such interactive models, the way in whichconnections are made between the two systemsshould be constrained by the depth of theorthography that is beir modeled. In a shalloworthography, a graphemic node can be connectedto only one phonemic node, and vice versa. Also,because words are spelled uniquely, each wordnode in the orthographic system must beconnected to only one word node in the phonologicsystem. In contrast, in a deeper orthography, agraphemic node may be connected to severalphonemic alternatives, a phonemic cluster may beconnected to several orthographic clusters, andfinally, a word in the phonologic system may beconnected to more than one word in theorthographic system, as in the case of homophony(e.g., SAIL/ SALE) or, vice versa, as in the case ofhomography (e.g., WIND, READ, BOW, etc.). Arepresentation of the different intersystemconnections is demonstrated in Figure 3 for aword that exists in both the English and theSerbo-Croatian languages. The Serbo-Croatianword, KLOZET, is composed of unique letter-sound correspondences while the correspondingEnglish word, CLOSET, is composed ofgraphemes, most of which have more than onepossible phonologic representation, and phonemes,

1 76

Page 178: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Orthography and Phonology: The Psychological Reality ofOrthographic Depth 169

most of which have more than one orthographicrepresentation.

SERBO-CROATIAN

ORTMOORMIC NETWORK1

1 PHONOLOGIC NETWORK

."-- 1

ENGLISH

ORTHOGRAPHIC NETWORK i PHONOLOGIC NETWORK

I

Figure 3.

As shown in Figure 3, the simple isomorphicconnections between the orthographic and thephonologic systems in a shallow orthographyshould enable subjects to restore both thedegraded phonemes from the print and thedegraded graphemes from the phonemicinformation, with ease. This should be truebecause, in a shallow system, partial phonemicinformation can correspond to only one, or atworst, a few, graphemic alternatives, and viceversa. In contrast, in a deep orthography, becausethe degraded information in one system is usuallyconsistent with several alternatives in the othersystem, the buildup of sufficient information for aunique solution to the matching judgment isdelayed, and the matching between print anddegraded speech, or between speech and degradedprint, is slowed. Therefore, the effects of visual orauditory degradation was greater for English thanfor Serbo-Croatian.

The importance of orthographic depth:critique and conclusions

The psychological reality of orthographic depthis not unanimously accepted. Although it isgenerally agreed that the relation betweenspelling to phonology in different orthographiesmight affect reading processes to a certain extent,there is disagreement as to the relativeimportance of this factor. Seidenberg and hisassociates (Seidenberg et al. (1984); Seidenberg,1985; Seidenberg & Vidanovid, 1985) have arguedthat the primary factor determining whether ornot phonology is generated prelexically is notorthographic depth, but word frequency. Theirclaim is that in any orthomaphy, frequent wordsare very familiar as visual patterns. Therefore,these words can be easily recognized through afast visually-based lexical access which occursbefore a phonologic code has time to be generatedpre-lexically from the print. For these words,phonologic information is eventually obtained, butonly postlexically, from memory storage.According to this view, the relation of spelling tophonology should not affect recognition of frequentwords. Since the orthographic structure is notconverted into a phonologic structure by use ofgraphemes-to-phonemes conversion rules, thedepth of the orthography does not play a role inthe processing of these words. Orthographic depthexerts some influence, but only on the processingof low-frequency words and nonwords. Since suchverbal stimuli are less familiar, their visual lexicalaccess is slower, and their phonology has enoughtime to be generated prelexically.

In support of this hypothesis, Seidenberg (1985)demonstrated that there were few differencesbetween Chinese and English subjects in namingfrequ,,... pri'ted words, This outcome wasinterpreted to mean that in both logographic andalphabetic orthographies, the phonology offrequent words was derived postlexically, after theword had been recognized on a visual basis.Moreover, in another study, Seidenberg andVidanovid (1985) found similar semantic primingeffects in naming frequent words in English andSerbo-Croatian, suggesting again that thephonology of frequent words is derivedpostlexically, whatever the depth of theorthography. These results are consistent with arecent study by Carello, Lukatela, and Turvey(1988), that demonstrated associative primingeffects for naming in Serbo-Croatian. AlthoughCarello et al. did not manipulate wora-Nquencyin their study, their results question theinevitability of pre-lexical phonology in a shallow

.177

Page 179: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

170 Frost

orthography; some lexical influence on wordrecognition may be possible.

The resolution of these conflicting results iscertainly not a simple task. A possible approachfor examining the source of these differences ct. tldconsist of examining the experimentalcharacteristics of these studies. One salientfeature of most of the experiments discussedabove is that they were conducted exclusively inthe visual modality; that is, print alone was usedto study the relationship between orthographyand phonology. The experimental manipulation ofphonology, therefore, has been indirect, havingbeen derived from manipulating the orthography.One can criticize this methodology for studyingthe processing consequences of the relationbetween phonology and orthography: Becausephonologic variation is typically obtained throughorthographic variation, one can never be certainwhich of the two is controlling the subject'sresponses. A simple example can be given in thecase of homophones. The common assumption thattwo homophones (e.g., bear/bare; sale/sail), sharea phonologic but not ar orthographic struct'tre(see for example, Rubenstein et al., 1971) is, in away, misleading. Homophones always shareprinted consonants or vowels, and the task ofdisentangling the effect of the shared phonologyfrom the shared orthography is complicated.Moreover, doubts have been raised about theadequacy of the lexical decision and naming tasksfor measuring lexical as contrasted with prelexicalinvolvement (see Balota & Chumbley, 1984, 1985).

The technique of simultaneous visual andauditory presentation with degradation proposedby Frost and Katz (1989); (see also Frost, Repp, &Katz, 1988), furnishes partial solutions to thesemethodological problems. First, phonology ispresented to the aubjects through a spoken wordand does not have to be inferred from print. Moreimportantly, by degrading the print or the speech,the technique affords a way to independentlymanipulate the perception of orthography andphonology. By using this method, Frost and Katz(1989) have demonstrated that orthographic depthand not word frequency is the primary factor thataffects the generation of pre- or post-lexicalphonology.

However the assfssment of the role oforthographic depth in reading cannot be resolvedsolely with methodological arguments. Oneimportant conclusion from two decades of studiesin reading is that the reader uses variousstrategies in processing printed words. (seeMcCusker et al., 1981). These strategies have

been shown to depend on factors like orthographicregularity (Parkin, 1982), word frequency(Scarborough, Cortese, & Scarborough, 1977),ratio of words and nonwords (Frost et al., 1987), orspecial demand characteristics of theexperimental task (e.g., Spoehr, 1978). By thesame argument, one cannot fully account for thereader's processing without taking intoconsideration the reader's linguistic environment.Although the skilled reader in every orthographybecomes familiar with his own language'sorthographic structures, I suggest that the depthof the orthography is an important factor.

One common misinterpretation of claimsconcerning the importance of orthographic depthis to view a language's orthographic system asconstraining the reader to only one form ofprocessing. For example, although Frost et al.(1987) have shown no semantic facilitation fornaming a specific set of stimuli in Serbo-Croatian,it does not follow that Serbo-Croatian readersnever generate phonology post-lexically. Oneshould always give the reader credit for extensiveflexibility. If the words in the experiments wereclosely associated, even the Serbo-Croatian readermight find the extraction of phonology post-lexically more efficient then a pre-lexicalextraction. But under similar conditions, relativedifferences should be found between deep andshallow orthographies.

In conclusion, the argument concerning theeffect of orthographic depth is an argumentconcerning the priority of using a specificprocessing strategy for generating phonology indifferent orthographies. Research conducted inEnglish Serbo-Croatian and Hebrew suggests thatorthographic depth has indeed a strongpsychological reality.

REFERENCESBalota, D. A., & Chumbley, J. I. (1984). Are lexical decisions a

good measure of lexical access? The role of word frequency inthe neglected decision stage. Journal of Experimental I ychology:Human Perception and Performance, 10, 340-357.

Balota, D. A., & Chumbley, J. I. (1985). The locus of wordfrequency effects in the pronunciation task: Lexical accessand/or production? Journal of Memory and language, 24, 84-106.

Baron, J., & Strawson, C. (1976). Use of orthographic and word-specific knowledge in reading words aloud. Journal ofExperimental Psychology: Human Perception and Performance, 2,386-393,

Bentin, S., Bargai, N., & Katz, L. (1984). Orthographic andphonemic coding for lexical access: Evidence from Hebrew.journal of Experimental Psychology: Learning Memory & Cognition,10,353 -368.

Bentin, S., & rrost, R. (1987). Processing leMc4 ambiguity andvisual word recognition in a deep orthography. Memory SrCognition, 25,13 -2:,.

17s

Page 180: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Orthography vnd Phonon. ,w The Psychological ReaVty of Or:hographk Depth 171

Carello, C., Lt,katela, C.., & Turvey, M. T. (1988). Rapid naming isaffected by association but not by syntax. Memory & Cognition,16, 187-195.

Feldman, L. B , & Turvey, M. T (1983). Visual word recognition inSerbo-Croatian is phonologically analytic. Journal ofExperimental Psychology: Human Perception and Performance, 9,288-298.

Frost, R., & Katz, L. (1989). Orthographic depth and theinteraction of visual and auditory processing in wordrecognition. Memory & Cognition, 17, 302-310.

Frost, R, Katy L, & Bentin, S. (1987). Strateges for visual wordrecognitio. and orthographical depth: A multilingualcomparison. Journal of Experimental Psychology: HumanPerception and Performance, 104-114.

Frtxt, R., Repp, B. H., & Kati, L. (1938). Can speechperception beinfluenced by a simultaneous presentation of print? Journal ofMemory and Language, 27, 741-755.

Katz, L, & leldrnan, L. B. (1981). Linguistic coding in wordrecognition: Cc nparisons between a deep and a shalloworthography. In A. M. Lesgold & C A. Perfetti (Eds.), Interactiveprocesses in reading (pp. 85-105). Hillsdale, NJ: LawrenceErlba. In Associates.

Katz, L., & Feldman, L. B. (1983) Relation between pronunciationand recognition of printed words in deep and shalloworthographies. Jot. Dal of Experimental Psychology: Learning,Memory, and Cognition, 9, 157-166.

Klima, E. S. (1972). How alphabets might reflect language. In F.ifavanagh & I. G. Mattingly (Eds.), 141,181418e by av and by eye.Cambridge, MA: The MIT Press.

Koriat, A. (1985). Lexical access for low and high frequency wordsin Hebrew. Memory & Cognition, 13, 37-44.

Oberman, I. Y., Uberman, A. M., Mattingly, I. G., & Shankweiler,D. (1980). Orthography and the beginning reader In J. F.Kavanagh Sc R. L. Venezky (Ede.), Orthography, erulins, anddyslexia (pp. 137-153). Austin, 7'X: Pro-Ed.

Lukateia, G., Popadid, D., Ognenovid, P., & Turvey, M. T. (1980).Lexical decision in a phonologically e allow orthography.Memory & Cognition, 8, 415423.

McClelland, J. L, & Runtelhart, D. (1981). An interactiveactiv on model of context effects In letter perception: Part 1.An account of basic finding. Psychological Review. 88, 37..-407.

McCusker, L. X., Hillir,ger, M. L., & Bias, R. C. (1981) phonologicrecoding and reading. Psychological Bulletin, 89, 217-245.

Navon, D., & Shimron, Y. (1984). Reading Hebrew: H-isvnecessary is the gragheomgircaphreprementation of vowels? In Leslie

enHderson (Ed.), us and reading: Perspectives fromcognitive psychology, neuropsychology, and linguistics. Hillsdale,NJ: Lawrence Erlbaum Associates.

Neely, ! (1976). Semantic priming and retrieval from lexicalmemury: Evidence for racilltatory and inhibitory processes.Memory & Cognition, 4, 648-654.

Neely, I. H. (1977). Semantic priming and retrieval from lexicalmentory: Roles of inhibitionless spreading ,ctivation and

limited-capacity attention. Journal of Experimental Psychology:General, 106, 226-254.

Onifer, W. & Swinney, D. A. (1981). Accessing lexical ambiguitiesduring sentence comprehension: Effects of frequency ofmeaning and contextual bias. Memory & Cognition, 9, 225-236.

Parkin, A. J. (1962). phonologic recoding in lexical decision:Effects of spelling to sound regularity depends on howregularity is define. Mmtory & Cognition 10,43-53.

Rubenstein, H., Le , S. c., oa iLbenstein, M. A. (197- Evidencefor ro \nnemic recoding 'n visual word recogniti ...arrial ofVerb& Leanting and Verbal c;zhavior, 1J, 645-657.

Rumelhart, D., & McClelland, J. L 0982). An interactiveactivation model of context effects in letter perception: Part 2.The contextual enhancement effect and some tests andextensions of the model. F4chological Bevies; 89, 60-94.

Samuel, A. G. (1987). Lexical uniqueness effects on phonemicres.aratice. Journal Memory and Language. 26, 3647.

Scarborough. D. L., Cortese, C., & Scarborough, H. S. (1977).Frequency and repetition effects in lexical memory, Journal of(Experiment ,sychology: Human perception and Performance, 3,1-

Seidenberg A S. (1985). The time course of phonologic activationin two writing systems. Cognition, 19, 1-30.

Seidenberg, M. S.,Tanenhaus,1k4. K., Liman, J.M., & Bienkowsky,M. (1982). Automatic access of the moaning of ambiguouswords in context: Some limitations of knowledge-basedprocessing. Cognitive Psychology 14, 489-537.

Seidenberg, M. S. & Vidanovid, S. (1985). Word recognition inSerbo- Croatian and English: Do they differ? Paper presented atthe Twenty-fifth Annual Meeting of the Psychonomic Society,:.atop.

Seidenberg, M. S , G. S., & Barnes, M. A (1984) When doesirregular spelling ). ronuttdation influence word recognition?Journal of Verbal (earning and Verbal Behavior, 23, 383-4 4.

Spoehr, K. T. (1978). Phonological recoding in visual w rdrecognition. journal of Verbal Learning and Verbal Behavior, 17,127-141.

Simpson, G. B. (1981). Meaning dominance and semantic contextin the processing of lexical ambiguity. Jour ta of Verbal learningand Verbal Behavi,,r, 20, 120-136.

Tanenhaus, M. K., Lehman, J. M., & Seiderberg, M. S. l -

Evidence of multiple stages in the processing of ambiguouswords in syntactic contexts. Journal of Verbal Learning and VerbalBehavior, 18, 427-440.

FOOTNOTESTo appeal. in M. Noonan, P. Downing & S. Lima (Eds.), Literacyand Linguistics. Amsterdam /Philadelphia: John BenianiinsPublishing Co.

tNow at the Department of Psycholoby, Hebrew University,Jerusa;en , Israel.

. 1

Page 181: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Haskins Laboratories Status Report on Speech Research1989, SR-99 / !Oa 172-179

Phonology and Reading: Evidence L ,tn ProfoundlyDeaf Readers*

Vicki L. Han:ont

The prelingually, profoundly hearing - impaired reader of English is at an immediatedisadvantage in that he or she must read an orthography that was designed to representthe phonological structure of English. Can the deaf reader become aware of this structurein the absence of significant auditory input? Evidence from studies with deaf collegestudents will be considered. These studies if.dicate that successful deaf readers doappreciate the phonological structure of words and that they exploit this knowledge inreading. The finding of phonological processing by these deaf readers makes a strong casefor the importance of phonological sensitivity in the acquisition of skilled reading, whetherin hearing readers or deaf readers.

In ',:he normal course of events, children arefluent speakers and listeners of their nativetongue, English, before they begin learning toread. In this chapter, I am concerned with apopulation of readers for whom this is not thecase. This population is prelingually, profoundlyhearing-impaired. For these deaf readers, speechand lipreading are difficult to acquire and requireyears of instruction.

Research on reading has indicated that buildir zon a spoken language found'tion is a criticalfeature of reading and that in order to use analphabetic orthography, such as English, to bestadvantage, the reader must go beyond the visualshape of words to apprehend their internalphonological structures (Liberman, 1971, 1973).Despite their extensive experience in using theph'rnology in everyday speech, evidence presentedelsewhere in this monograph argues that hearingchildren who are poor readers may havephonological deficits that underlie their readingproblem. These children have difficulty in settingup phonological structures, in apprehending such

This writing of this paper wu supported, in part, by GrantNS-18010 from the National Institute of Neurological andCommunicative Disorders and Strike. I wish to thank CarolPadden for her thoughtful reading of an earlier version of thismanuscript.

172

structures in words, and in using a phonetic codefor the storage and processing of words in workingmemory. The phonological deficits of thesechildren may be fairly subtle, however, such thatno difficulty in the child's speaking ability orlistening comprehension may be readily apparent.

If in the hearing population even subtle phone-loL.. Al deficits are associated with poor reading,then how is it possible for profoundly deaf indi-viduals to read? One might suppose that deafpersons would have difficulty with reading, and,indeed, his is the case. Sur ; have consistentlyshown that hearing-impaired students las signifi-cantly behind their normally-hearing counterpartsin reading achievement (Conn d, 1979; Furth,1966; Trybus & Karchmer, 1977). Although it istypical to state, based on these surveys, that theaverage hearing-impaired student graduatingfrom high school reads only at about the level of ahearing child of fifth grade, that statistic obscuresthe even greater reading deficiency of profoundlyhearing-impaired students, that is, those whocould be considered truly deaf. For them, thestatistics are even more discouraging. Profoundlydeaf students gradr fling from high school read,on the average, only at the level of a normallyhe lring child of third grade (Conrad, 1979;Karchmer, Milone, & Wolk, 1979). Remember,though, that these reading achievement scores

1E0

Page 182: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Phonology and Readi, . Evidence from Profoundly Deaf Readers 173

represent only a population average. Vie find, forexample, that measures of reading achievementfor deaf students attending Gallaudet Universitymay average seventh to tenth grade. with somestudents reading at above twelfth grade level (see,for example, Hanson, 1988; Hanson & Feldman,1989; Hanson, Shankweiler & Fischer, 1983;Reynolds, 1975).

These statistics on reading achievement levelsof deaf students have been used by investigatorsto argue two opposing views of the relationshipbetween phonology and reading. The assumptioncommon_ to both views is that the hearingimpairment of these students prevents access toEnglish phonology. In one view, access tophonological information is believed to be crucialfor reading and the generally ,ow readingachievement levels of deaf students are believed toreflect its importance. Because these readerspresumably lack access to English phonology,their acquisition of reading suffers as aconsequence. The second view takes the positionthat access to phonological information is notimportant in reading. The fact that some deafindividuals are able to attain fairly high readinglevels is take; as evidence of this. Again, theassumption is that these readers, due to theirhearing impairment, lack access to phonologicalinformation. Consequently, if they succeed atreading it must be without benefit of phonology.

Neither of these positions need be correct,however, in their interpretation of the readingachievement of deaf students. Deaf readers,despite their hearing impairment, might haveaccess to phonology that could be used to supportskilled reading. To assume that deaf readers lackaccess to phonology because of their deafnessconfuses a sensory deficit with a cognitive one.While the term phonological is often used to meanacoustic/auditory, or sound this usage reflects acommon misunderstanding of the term.Phonological units of a language are not sound-,but rather a set of meaningless primitives out ofwhich meaningful units are formed. Theseprimitives are related to gestures articulated bythe vocal tract of the speaker (see Liberman &Mattingly, 1985 for a more detailed discussion).1In the case of English, the deaf individual couldlearn about the phonology of the language fromthe motor events involved in speech oduction,through experience in lipreading, or fromexperience with the orthography.

As a rule, deaf children in English-speakingcountries receive intensive instruction in speakingand lipreading. This is true both in schools that

use an oral educational approalh (with speechbeing the only means of communication used inthe classroom) and in schools that use asimultaneous or total communication approach(with speech being accompanied by manualcommunication in the classroom). Through tl-speech training, prelingually, profoundly deafindividuals develop varying skill in speaking andlipreading. Although some of these individualsdevelop quite good speaking and lipreading skills,most do not (Conrad, 1979; Smith, 1975). Speechtraining, nevertheless, does provide the deafindividual with a means of learning Englishphonology.

Speech intelligibility does not necessarilyindicate, however, the extent to which a deafreader has access to phonological information.Intelligibility reflects the degree to which a deafspeaker's speech can be understood by a listener.Among the things that can effect intelligibility arephonation and prosodic informat m. While suchfeatures clearly add to intelligibility, they may notbe relevant for an individual" internalmanipulation of phonological information. In anyevent, it cannot simply be assumed that deafnessnecessarily blocks access to English phonology.This is a question for empirical investigation.

How do congenitally, profoundly deaf readerswho read well manage to do it? That is thequestion. to be addressed here. It is possible thatdeaf readers read English as if it were alogographic language; namely, treating printedEnglish words as visual characters, withouttaking 'nto account the correspondences betweenthe printed letters and the phonological structureof words. Research on the reading of Japanese andChinese, however, has suggested that forlogogranhie languages, E s .7*. alphabeticlanguages, phonetic receding -$' tvute..s is onecomponent of a linguistic processing systemrequired for the task of reading (Erickson,Mattingly, & Turvey, 1977; :vlarn, 1985; Tzeng,Hung, & 'Nang, 1977). For example, Tzeng et al.(1977) found that the phonetic composition ofprinted Chinese characters influenced sentenceprocessing for skilled readers of Chinese. Theseinvestigators concluded that even in cases . herelexicc: access is possibl: without phonologicalmediation, a phonetic mde is still required foreffective processing in ve eking memory.

The deaf individuals who participated in thestudies to be discussed here had backgrounds inwhich sign was used predominantly. That is, theygenerally had or were receiving instruction usingsign language. Most of these individuals

181

Page 183: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

174 Hanson

considered American Sign Language (ASL), to betheir preferred means of communication. ASL isthe common form of communication used bymembers of deaf communities across the UnitedStates and parts of Canada. It is a visual-gesturallanguage that has developed independently fromspoken languages and from other signedlanguages. For many of the subjects in the studiesreported here, ASL was their first language,having been learned as a native language fromdeaf parents. These subjects were typicallyundergraduates at Gallaudet University. All wereprofoundly deaf. These deaf subjects, therefore,can be characterized as having higher thanaN :rage reading levels &Id not being exposed toan exclusively oral background.

Findings on phonetic coding in workingmemory

Evidence reviewed elsewhere in this monographindicates that hearing children who are poorreaders have a language deficit that is specific tothe phonological domain. For example, in tests ofshort-term memory hearing poor readers recallfewer items overall and display less sensitivity torhyme than hearing good readers (see, forexample, Shankweiler, Liberman, Mark, Fowler,& Fischer, 1979). That is, on rhyming lists, theaccuracy of the good readers is typically worsethan o i nonrhyming lists. In contrast, theaccuracy of the poor readers is about the same forrhyming and nonrhyming lists. The good readers'differential performance on recall of rhyming andnonrhyming strings has been taken to mean thatthese readers convert the printed letters into aphonetic form and retain this phoneticinformation in memory. Accordingly, the findingthat poor readers are nut much affected byrhyming manipulations suggests that they areless able to use the phonetic information.

Is a phonetic code uniquely well-suited to thetask of reading? To examine this qu ion, weasked whether for deaf signers a differentlanguage code, one lased on the structure of signs,could provide an alternative coding system forreading.

A sign of ASL is produced by a combination ofthe rormational parameters of handshape, place ofarticulation, movement, and orientation (Battison,1978; Stokoe, Casterline, Croneberg, 1965).Evidence 'ndicates that when signs are presentedfor recall in a short-term memory task, the signsare coded in terms of these formationalparammrs. The first line of evidence comes fromstudies of intrusion errors in the recall of lists of

signs (Bellugi, Klima, & Siple, 1975; Krakow &Hanson, 1985). In the study by Bellugi, Klima,and Siple, lists of spoken words v. signs werepresented to hearing adults and deaf adults,respectively. The subjects wer tsked forimmediate written recall. Intrusion iv I for thehearing group were confusions of p.Aoneticallysimilar words. For example, a subject might writethe word boat instead of the word presented,"vote." Errors for the deaf subjects, however, werecompletely different; they were confusiom of theformational parameters of signs. As an example, asubject might write the word gig instead of thesign presented, NAME. The signs correspondingto the words name and egg differ only in terms ofthe movement of the hands (see Figure 1).

Figure 1. Formationel similar signs. Shown left to rightfrom he top are KNIFE, EGG, NAME, PLUG, TRAIN,CHAIR, TENT, S 1 LT. (From Hanson, 1982, p. 574).

1 E; 2

Page 184: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Phonology and Reading: Evidence from Profoundly Deaf Readers 175

In addition, there is evidence that lists offormat'onally similar signs can produceperformance decrements in serial recall tasks(Hanson, 1982; Poizner, Bellugi, & Tweey, 1981).For example, shown in Figure 1 is a set offormationally similar signs that I used in one suchstudy (Hanson, 1982). Shown here are, left toright from the top, the signs for the words KNIFE,EGG, NAME, PLUG, TRAIN, CHAIR, TENT, andSALT. On each trial in that experiment, deafcollege students were shown five of the signs fromthis set, and were asked to remember the fivesigns in order. Results of that experimentindicated that fewer signs were recalled from listsmade up of signs from this formationally similarset than from lists made up of signs from aformationally unrelated kcoiltrol) set.

Despite such evidence that sign coding can thusmediate short-term recall of signs, evidence fromother research does not support the notion that asign code can serve as a viable code in the serviceof skilled adult reading. In another condition ofthat study (Hanson, 1982), I tested deaf collegestudents in a short-term recall task of printedwords. There were three types of word lists ofinterest here: rhyming words, orthographicallysimilar words, and words whose signs wereformationally similar. The words in the rhymingset were two, blue, who, chew, shoe, through, jew,and you. The words in the orthographicallysimilar set were visually similar. The words itth.s set were bear, meat, head, year, learn, peace,break, and dream. While argument could be takenwith the degree of visual similarity of the words inthis list, it is at least true that these words aremore similar visually than were the words in therhyming list. The words in this visually similarset served as a control to ensure that anypotential rh: -ne effects could not be attributed tothe visual similarity of the printed words. Thewords in the formationally similar set were wordswhose corresponding signs were formationallysimilar. These were the words knife, name, plug,train, chair, tent, and salt, whose correspondingsigns are shown in Figure 1. Each of these setswas paired with a control set of words. Of interestin this experiment was any differences in abilityto recall an experimental and control set.

The pattern of results in that experiment clearlyindicates the use of phonetic coding by the deafsubjects. Whereas these subjects recalled 65.4percent of the lists in the control condition, theyrecalled only 47.6 pet.ont of the lists in thephonetically similar (rhyming) condition. Therewas, however, re decrement on the visually

similar lists, indicating that tae decrement on therhyming lists was due to phonetic, not visual,similarity.

Interestingly, I found no evidence that the deafcollege students I tested were using sign coding.Their performance on lists of words havingformationally similar signs and on the control listswas comparable (52.9 percent of the control listsrecalled vs. 51.4 percent of the formationallysimilar 1; its recalled). Converging evidence fromlater research supports this finding that the betterdeaf readers do not use sign coding in their recallor reading of printed English wa, (Lichtenstein,1985; Treiman & Hirsh- Pasek, 1983). More than100 years ago, Burnet (1854) argued that signcoding would be a ponderous strategy for the deafreaders, and, thus, limited in its use to the poorerreaders. By finding that the better deaf readers donot use sign coding when processing printedEnglish words, current research on the cognitiveprocessing of deaf readers is consistent withBurnet's speculations.

The finding that tl,e better readers were usingphonetic coding is reminiscent of the resultsreported by R. Conrad (1979) in a very large scalestuds of deaf and hearing-impaired students inEngland and Wales. Conrad tested these studentsin a short-term memory task of rhyming andnonrhyming lists of printed letters. Comparingtheir performance on this memory task withmeasured reading ability, Conrad found that thebetter readers in his deaf population recalledfewer rhyming than nonrhyming lists. Thus, thebetter readers were using phonetic coding.

My study with deaf signers (Iganson, 1982) tookConrad's findings one step further. Conrad'ssubjects were from schools that generallysubscribed to an or ' philosophy of education. As aresult, phonetic coding was the only languageform available to the subjects. In my study, thedeaf subjects had sign language readily availableto them. In fact, all of my deaf subjects had deafparents and reported ASL to be their firstlanguage. Yet, these signers, to skilled deafreaders, used phonetic coding in that memorycask, indicating the importance of phonetic codingin short-term retention of printed material.

Sensitivity to the phonological structure ofEnglish words

Additional evidence that deaf readers can accessphonological information about English words isprovided by studies of ndividu -1 word reading.For example, one experimental paradigm that hasbeen shown to produce paonological effects with

183

Page 185: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

176 Hanson

hearing readers uses a lexical decision task inwhich two letter strings are shown to the subjectson every trial, one string clove the other (Meyer,Schvaneveldt, & Ruddy, 1974). The subjects mustdecide whether or not both of the letter strings ona trial are real English words.

In a series of three experiments, we used thisparadigm with deaf college students (Hanson &Fowler, 1987). There were two types of word pairsof particular interest. As shown in Table 1, thefirst was pairs in which the two words rhymed.These rhyming words were spelled alike except forthe first letter. The second type of word pair ofinterest was pairs in which the two words werespelled alike except for the first letter, but thepairs did not rhyme. It is apparent that thisrhyming and nonrhyming pairs were equallysimilar orthographically, differing only in thephonological similarity of the two members of apair. We tested whether there was any differencein the response times to the rhyming andnonrhyming pairs. Since, however, response timesto words vary with word familiarity andorthographic repularity, it was not possible in thisstudy to simp., compare the responses to therhymi- 3 and nonrhyming pairs. To eliminatefar arity and regularity as confounding factors,

J control conditions were used. Word pairs inthe control conditions used the same words as inthe rhyming and nonrhyming pairs, but wererepairings of these words. Thus, the control pairsfor the rhyme condition were the same words as inthe rhyme condition, just paired now withdifferent words. For example, in the rhymecondition, the words save-wave and fast-past werepaired together, while in the rhyme control ',aye-past were paired together and fast-wave werepaired together. Similarly, the control pairs forthe nonrhyme condition were the same words asin the nonrhyme condition, just paired withdifferent words. By comparing each word in therhyme and nonrhyme condition with itself in acontrol condition, any effects of word frequencyand regularity were eliminated.

Table 1. Rhyming and nonrhyming pairs and theirmot: lied controls.

Rhyming Pairssave-wave

fast

Nonrhyming Pairshave-cavelast-east

Rhyming Controls Nonrhyming Controlssave-part have-eastass-wave last-cave

Source of data From Hanson & Fowler. 1987, Experiment 2.

The predictions for this experiment are shownin Table 2. If :eaders in this task did not accessphonological information, then there should havebeen no effect due to the phonologicalrelationships between words in a pair. That is, ifthe readers were using =olely orthographicinformation, then the first equation shown herewould hold; namely, that the difference betweenresponse times to rhyming pairs and the rhymecontrols would equal the difference betweenresponse times to nonrhyming pairs and theircontrols. Thus, response times would be the samewhether or not the winds of a pair rhymed.

Table 2. Predictions in the Hanson & Fowler, 1987Study.

If ORTHOGRAPHIC CODING. themControl - Rhyming Control Nonrhyming

If PHONOLOGICAL CODING. then:Control - Rhyming * Control Nonrhyming

If, however, readers were accessing phonologicalinformation, then there would be a difference inresponse times as a function of phonologicalrelationships between words in a pair. Access tophonological information would be indicated if thesecond equahon held; namely, that e twodifferences in response times would not r 'creed.

In that event, the response times wt Aid beaffected by the rhyming manipulation.

For the deaf college students we tested, therewas an effect of the phonological relationshipbetween the words in a pair. Shown in Table 3 arethe response times from one experiment of thatstudy (Experiment 2). Response times were faster

the rhyming pairs thw.. for the matchedcontrols. In ccntrast, response times were sloweron the nonrhyming pairs than on the matchedcontrols. Since the rhyming and nonrhyming pairswere equally similar orthographically, thissignificant difference in response times for therhyming and nonrhyming pairs was not due toorthographic influences. As a consequence, thedifference in response times to the rhyming andnonrhyming pairs could be unambiguouslyattribixted to the discrepant phonologies;structures. Thus, these good deaf readers, likehearing readers, accessed phonologicalinformation when reading words.

Most impressive is the tending that the deafsubjects in that study were not only accessingphonological information, but were doing so in ahighly speeded task. It might be supposed that

184

1

Page 186: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Phonology Evidence from Profoundly Deaf Readers 177

deaf readers would be able to access phonologicalinformation only in situations in which they havetime to laboriously recover learned pronuncia-tions. In this research, however, we found thatthey accessed phcoological information quiterapidly, suggest.ng that access_ng suchinformation is a fundamental property of readingfor these skilled reader..

Table 3. The response time (RT) difference for therhyming and nonrhyming pairs and their matchedcontrols for deaf college students.

Control - Rhyming (52 ms) *Control - Nonrhyming (-15 ms)

Source of data: From Hanson Fowler, 1987, Experiment 2.)

In more recent work, we have f nd otherevidence tht.t skilled deaf readers are nsitive tothe phonological structure of words. For example,deaf college students, when asked to think ofwords that rhyme with a specific target word havebeen found to be able to do so (Hanson & McGarr,1989). In addition, we have found that deaf collegestudents are able to apply principles of grapheme-phoneme correspondence in generating the correctpronunciation of letter strings not previouslyencountereda skill underlyiit t. the acquisition ofnew words. In this latter task (Hanson, 1989), _

tested these students on their reading oforthographically possible nonwords; that is,pseudowords. The clitical test was betweenpseudowords such as (taint that werehomophonous with an actual English word (flame)and control pseudowords. These controls wereorthographically-metched pseudowords that werenot homophonous with an actual English word(e.g., proom). Examples of stimuli from the taskare shown in Table 4.

Table 4. Examples of pseudohomophones and controlpseudowords.

Elle:0- Word Pseudohomophone Control

Pine Haim proomdog daug ginespoon game foshtall tad irate/tone hoot Vailblue bloo nolo110041 mine rune

Source: Selection of stimulus items from Hanson, 1989, basedon Macdonald, 1988.

In two experiments us"ng different lists ofpseudowords, a paper and pencil task testedwhether subjects could identify which of severalpseudowords were homophonous with Englishwords. The actual instructions to subjects werethat they were to indicate whether or not each ofthe 'nonsense words" was pronounced like a realEnglish word. In both experiments, deaf collegestudents were able to correctly make thisjudgment with better than chance accuracy.although they were not as in-crtrate as the hearingsubjects. As an additional aspect of thispseudohomophone task, subjects in one of the twoexperiments were asked to indicate which Englishword they thought a pseudoword was pronouncedlike, if they had indicated that they thought it waspronounced like one. In this second task, deafsubjects were usually able to supply the correctEnglish word.

Studies on individual word reading thusindicate that it is possible for deaf readers to haveaccess to English phonology. This does not meanthat such access is easy for these readers. Nordoes it mean that all or even most deaf readersare able to use this information. The point, rather,is that hearing loss alone does not preclude accessto phonology. In addition. it is important to noteand that the better dear readers generally takeadvantage of this phonolo" *1 information.

Why phonological coding?In sum, the evidence, which ',as 1,een

summarized here, indicates that it is possible fordeaf readers to use phonology. The use ofphonological information tends to be characteristicof deaf good readers, whether they are beginningrenders (Hanson, Liberman, & Shankweiler,1984), high school students (Conrad, 1979;McDermott, 1984), or college students (Hanson,1982; Hanson & Fowler, ! i87; Lichtenstein,1985). Why would the better deaf readers use thistype of linguistic information when reading? Onepossibility has to do with the structural propertiesof particular languages. In English, where wordorder is relatively fixed, grammatical structuringis essentially sequential. A phonological code maybe an efficient medium for retaining thesequential information that 1 . epresented inEnglish.

Deaf individuals have specific difficulty in therecall of temporally sequential information(Hanson, in press). Studies have consistentlyfound that the measured memory span of deafindividuals is shorter than that of hearing persons

185

Page 187: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

178 Hanson

(see, for example, Bellugi et al., 1975; Blair, 1957;Belmont & Karchmer, 1978; Conrad, 1979;Hanson, 1982; Kyle, 1980; Pintner & Paterson,1917; Wallace & Corbralis, 1973). It is importantto note that this finding of a short span appliesnot only to the English materials (e.g., lists ofwords, letters, or digits), but also applies tostudies that have measured serial recall of signs.Fai-ly typical results were found in the Bellugi etal. (1975) study, in which deaf adults' correctserial recall of signs reached an asymptote with alist length of four signs, while the hearing subjectsreached asymptote with lists of six words. Thus,the differences in memory span found betweenhearing and deaf individuals appear to be due notsimply to unfamiliarity with the English material;rather, they appear to be related to cognitiveprocesses involved in short-term memory forlinguistic materials, in general.

Ability to maintain a sequence of words in short-term memory is related to the use of phonologicalcoding. That is, studies with orally trainedsubjects (Conrad, 1979), native signers of ASL(Hanson, 1982), and subjects mixed in terms Atheir educational and linguistic backgrounds(Lichtenstein, 1985), have all four d strongcorrelations between the magnitude of lire rhymeeffect for deaf subjects and measu .ed memoryspan. In these studies, the larger the . hyme effectfor a deaf subject, the larger that subject'smemory span. In contrast, no coryel-lion betweenuse of manual coding and measured memory spanhas been established (Lichtenstein, 1985).

Given this relationship between serial recallability- and phonological coding, we havesuggested tLet one reason the skilled deaf readeruses phonological coding may have to do with thecritical syntactic role played by sequentialstructuring in English (Hanson, 1982; Lake, 1980;Lichtenstein, 1985). This analysis suggests thatan issue to be faced by teachers is how to educatedeaf students to process a highly temporallystructured language such as English.

Deaf readers and phonologyIt is notable tl at the subjects in the studies

discussed v. ere not generally from oralbackgrounds. In some cases, subjects wereexpressly selected because they were nativesigners of ASL. Yet, even these subjects, if skilledreaders, were found to be using phonologicalinformation in the reading of Engliel., rather thanreferring to ASL.

A discussion of phonological senAivity in deafreaders always leads to the questio:, of how this

sensitivity is acquired. It ib likely thatcongenitally, profoundly deaf readers acquirephonology from a combination of three sources:experience with the orthography through reading,experience in speaking, and experience inlipreading. In many of the studies discussed here,there was evidnlice of phonological processing fordeaf subjects whose speech was not intelligible.That even these subjects use phonological codingsuggests that deaf individuals' ability to usephonological information when reading is not wellreflected in the intelligibility ratings of theirspeech. Further research is needed to determinethe type of language instruction capable ofpromoting access to the speech skills mostrelevant to reading.

When this chapter was first planned, it wastitled Is reading different for deaf individuals?The answer appears to be both yes and no. Clearlythe answer is yes in the sense that deaf readerswill bring to the task of reading very diftere t setsof language experiences than the hearing child.These differe._:es will require special instruction.But, the answer is also no. The evidence indicatesthat skilled deaf readers use their knowledge ofthe structure of English when reading. Althoughsign coding; in theory, might be used as analternative to phonological coding for deaf signers,the research using various short-term memoryand reading tasks has found little evidence thatwords are processed with reference to sign by thebetter deaf readers. Rather, the better deafreaders, like the better hearing readers, havelearned to abstract phonological information fromthe orthography, despite congenital and profoundhearing impairment.

The finding of phonological processing by deafreaders, particularly deaf readers skilled in ASL,makes a strong case for the importance ofphonological sensitivity in the acquisition ofskilled reading, whether the reader is hearing ordeaf. For deaf readers, the acquisition and use ofphonological information is extremely difficult.They would be expected to use alternatives st:7h.s visual (orthographic) or sign strategy, if such

were effective. Yet, the evidence indicates that thesuccessful deaf readers do not i ely on thesealternatives.

REFERENCESBattison, R. (1978). Lexical borrowing in American Sign language.

Silver Spring, ma Linstok Press.Benue, U., Klima, E. S., & Siple, P. (1975). Remembering in signs.

Cognition, 3, 93-125.Belmont, J. M., & Karchmer, M. A. (1978). Deaf people's memory:

There are problems testing special populations. In

1 6 6

Page 188: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Phonology and Reading: Evidence from Profoundly Deaf Readers 179

M. Cruneberg & IL Sykes (Eds.), Practical aspects of mensory (pp.581-588). London: Academic Press.

Blair, F. X. (1957). A study of the visual memory of deaf andhearing children. American Annals of the Deaf, 102,254-263.

Burnet, J. IL (1854). The necessity of methodic.' signs comIdered.American Annals of the Deaf, 7,1 -14.

Conrad, R. am. The deal schoolchild. London: Harper Row.Erickson, D., Mattingly. 7. C., & Turvey, M. T. (1977). Phonetic

aceivity in reading: An experiment with Kenji. Language andSpeeds, 2L', 384-403.

Furth, H. C. (1966). A comparia _. of reading test norms of deafand hearing children. American Annals of the Deaf, 111,461-462.

Hanson, V. L. (1982). Short-term recall by deaf signers ofAmerican Sign Language Implications for order recall. Journalof Experimental Psychology: Leming, Memory, and Cogn:ticn, 8,572-583.

Hanson, V. L. (1989). On the use of grapheme-phonemecornespondences by deaf readers. Manuscript.

Hanson, V. L (in press). Recall of order information by deafsigners: Cues to memory span deficits. Memory & Cognition.

Hanson, V. L & Feldman, L B. (1989). Language specificity inlexical organization: Evidence from deaf signers' lexicalorganization of ASL and English. Memory & Cognition, 17, 292-301.

Hansen, V. L., & Fowler, C. A. (1987). Phonologic.' ',ding inword reading: Evidence from hearing and deaf readers.Memory & Cognition, 15, 199-207.

Hanson, V. L, Liberman, I. Y, & Shankweiler. D. 0/84). Linguisticcoding by deaf children in relation to beginning readingsuccess. Journal of Experimental Chad Psychology, 37.

Hanson, V. L, & McGarr, N. S. (1989). Rhyme generation by deafadults. Journal of Speeds and Hearing Research, 32,2-11.

Hanson, V. L, Shankweiler, D., & Fischer, F. W. (1983).Determinants of spelling ability in deaf and hearing adults.Access to linguistic structure. Cognition, 14,323-44.

ICarcluner, M. A., Milone, M. N., Jr., Sc Wolk, S. (1979).Educational significance of hearing loss at three levels ofseverity.American Annals of the Deaf, 124, 97-109.

Krakow, R. A., & Hanson, V. L. (1985). Deaf signers and serialrecall in the visual modality: Memory for signs, fingerspellingand print Memory & Cogniiim,13, 265-72.

Kyle, J. G. (1980). Sign coding in short term memory in the deaf.In B. Bergman & I. Ahlgren (Eds.), Proceedings of the Fir IInternational Symposium on Sign Language Research. StockholmSwedish National Association of the Deaf.

Lake, D. (1980). Syntax and sequential memory in hearingimpaired children. In H. N. Reynolds & C. M. Williams (F.i's.),Proceedings of the Gallaudet Conference on Reading in Relation toDeafness.Washington, DC Division of Research, CallaudetCollege.

Liberman, A. M., & Mattingly, I. C. (1985). The te-or theory ofspeech perception revised. Cognition, 21,1 -36.

Uberman, 1. Y. (1971). Basic research in speech and lataalizationof language: Some implications for reading disability. Bulletin ofthe Orton Society, 21, 71-87.

Libermuil, 1. Y. (1973). Segmentation of the spoken word andreading acquisition. Bulletin of the Orton Society, 23, 65-77.

Uchtensteln, E H. (1985). Deaf working memory processes andEnglish language sk Ile. In D. S. Martin (Ed.), Cognition,

education,and deafness: Directions for research and instruction (pp.111-114). Washington, DC Callaudet College Press.

Macdonald, C. J. M. (1988). Can oral-deaf readers use spelling-soundinformation to decode? Manuscript

Mann, V. A. (1285). A cross-linguistic perspectiv. on the relationbetween temporary memory skills and early reading ability.Remedial and Special Education, 6,37-42.

McDermott, M. J. (1984). The role of linguistic proe -ming in the silentreading act: Recoding strategies in good and r deaf readers.Doctoral dissertation Brown University, Providence, RI.

Meyer, D. E., Schvaneveldt, IL W., & Ruddy, M. C. (1974).Functions of graphemic and phonemic codes in visual word-recognition. Memory &Cognition, 2, 309-321.

PIntna, R., & Paterson, D. G. (1917). A comparison of deaf andhearing children in visual memory for digits. Journal ofExperimental Psychology,2, 76-88.

Pointer, H., Bellugi, U., & Tweney, IL D. (1981). Processing offormational, semantic, and iconic information in American SignLanguage. Journal of Experimental Psychology: Human Perceptionand Perfmnance, 7,1146-1159.

Reynolds, H. N. (1975). Development of reading ability in relationto deafness. World Congress of the World Federation of the DeafFull Citiunsisip for all Deaf People (pp. 273284). Washington,DC: National Association for the Deaf

Shankweiler, D., Liberman, I. Y., Mark, L. S., Fowler, C. A., &Fischer, F. W. (1979). The speech code and learning to read.Journal of Experimental Psychology: Human Learning and Memory,5, 531-545.

Smith, C. IL (1975). Residual hearing and speech production hdear children. Journal of Speech end Hearing Research, 18,759-811.

Stokoe, W. C., Jr., Castaline, D., & Croneberg, C. (1965). Adictionary of American Sign Language. Washington, DCCallaudet College Press.

Treiman, IL, & Hirsh-Pasek, K. (1983). Silent reading: Insightsfrom congenitally deaf readers. Cognitive Psychology, 15, 39-65.

Trybus, IL, & ICarchmer, M. (1977). School achievement scores ofhearing impaired children: National date on achievementstatus and growth patients. American Annals of the Deaf, 122, 62-69.

Tzeng 0. J. L, Hung, D. L, & War, W. S-Y. (1977). Speechrecoding in reading Chinese characters. Journal of ExperimentalPsychology: Human Learning and Memory, 3,621-630.

Wallace, C., & Corballis, M. C. (1973). Short-term memory andcoding strategies in the deaf. Journal of Experimental Psychology,99,334-348.

FOOTNOTES'In D. Shankweiler & I. Y. Liberman (Eds.), Phonology and readingdisability: Solving the reading puzzle (pp. 69-89). Ann Arbor:University of Michigan Press. (1989).

*Now at IBM &wench Division, Thomas J. Watson ResearchCenter, Yorktown Heights, New York

tTh' term phonology need not be limited to use with spokenlanguages. In the case of American Sign Language, forexample, the term phonology has been used to describe thelinguistic primitives related to the visible gestures by thehands, face, aid body of the signer. In the present chapter,however, phonology will be restricted in reference only tofeatures related to spoken languages.

187

Page 189: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Haskins Lo)'oratories Status Report on Speech Research1989, SR- 5..1100,180 -194

Syntactic Competence and Reading Ability in Children*

Shiomo Bentin,t Avital Dr.mtsch,t and Isabelle Y. Libermantt

The effect of syntactic context on auditory word identification and on the ability to detectand correct syntactic errors in speech was examined in severely reading disabled childrenand in good and poor readers selected from the normal distribution of fourth graders. Thepoor readers were handicapped when correct reading required analysis of the sentencecontext. However, their phonological decoding ability was intact. Identification of wordswas less affected by syntactic context in the severely disabled readers than in either thegood or poor readers. Moreover, the disabled readers were inferior to good readers injudging the syntactical integrity of spoken sentences and in their abihty to correct thesyntactically awerant sentences. Poor readers were similar to good readers in theidentification azd judgment tasks, but inferior in the correction task. The results suggestthat the severely disabled readers were inferior to both good and poor readers in syntacticawareness, and in ability to use syntactic rules, while poor readers were equal to goodreaders in syntactic awareness but were relatively impaired in using syntactic knowledgeproductively.

T.,ont reading involves a complex inter action ofsevdral parallel processes that relate visualgraphemic stimuli to specific entries in the lexiconand combine the semantic and syntacticinformation contained in those entries toapprehend the meaning of sentences. Some ofthese processes relate to the decoding of thephonological code from print while others relate tothe assignment of meaning to the phonologicalunits. Although the decoding of the phonologicalcode can, in principle, be based solely on "bottom -up" application of grapheme-to-phonemetransformation rules, it is well documented thatthis process is supported by "top- down" streamingof lexical knowledge and contextual information.The common denominator of the "bottom-up" and"top-down" processes in reading is that both arecomponents of the human linguistic endowment(see Perfetti, 1985; !toxin & Gleitman, 1977).

This study was supported by the Israel FoundationsTrustees to Shlomo :tondo. Isabelle Liberman was supportedin part by National Institute of Child Health, and HumanDevelopment Grant HD-01994 to Hiuddnio Laboratories. Theuseful comments of Anne Fowler are much appreciated. Wegratefully acknowledge the cooperation of the principals andteachers of the integrative schools "Luria' and `Steen? inJerus-lem, and of the for &mediation Teaching inHertzelia,

atemr---

The relative contribution of context dependent(top -dowry processes to visual word recognition isdetermined by many factors among which readercompetence is particularly important. Althoughsome authors have argued that as fluencydevelops, the reader increasingly relies oncontextual information during word recognition(Smith, 1971), mere recent studies havedisconfirmed this hypothesis. For example,semantic primir.g facilitates lexical decisions morein children than in adults and more in youngerthan in older children (Schvaneveldt, Ackerman,& Semlear, 1977; West & Stanovitch, 1978).Similarly, context effects are greater in poorreaders than in good readers both in lexicaldecision (Schwantes, Boesl, & Ritz, 1980) andnaming tasks (Perfetti, Goldman, & Hogaboam,1979; Stanovitch, West, & Feeman. 1981). Withinthe same subject, larger contaxt effects occurwhen bottom-up processes are inhibited bydegrading or masking the target stimuli (e.g.,Becker & Killion, 197 ?; Massaro, Jones, Lipscomb,& Scholr, 1978; Meyer, Schvaneveldt, & Ruddy,1975). These results imply that the increased roleof higher-level contextual processes in visual wordrecognition is caused by a need to compensate fordeficiencies in lower level, decoding processes,

180 1

Page 190: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Syntactic Competence and Reading ANity in Children 181

such as those that occur with poo eaders ordegraded stimuli (Perfetti & I _I, 1981;Stanovitch et al., 1981).

The observation that context effects are largerin poor than in good readers does not imply,however, that poor readers are better or moreefficient users of contextual information than goodreaders. In fact, the opposite may be true. BothPerfetti et al. (1979) and Schwantes et al. (1980)reported that when subjects are required to usethe context for predicting the target word beforeseeing it, skilled readers do better than poorreaders. The magnitude of semantic primingeffects in visual word recognition may therefore bea rather poor indicator of the real ability to usecontextual information. A relatively unbiasedmethod for comparing good and poor readers'awareness of contextual information would be toevaluate context effects on word recognition insituations that either eliminate the need fordecoding of print, as in auditory presentation, orthat force all readers, regardless of skill, to usecontextual processes to the same extent.

As a language in which efficient use ofcontextual information is essential for fluentreading, Hebrew provides an excellent channelthrough which to examine syntactic effects. InHebrew orthography, letters represent mostlyconsonants, while vowels are represented bydiacritical marks placed below, ffithin, or abovethe letters. The vowel marks are usually omittedin writing except in poetry, holy scripture, andchildren's literature. Becluse different words maybe represented by the SCITIYI consonants butdifferent vowels, when these vowels are absent upto seven, eight or more different words may berepresented by the same string of letters. Inaddition to being semantically ambiguous, theseHebrew howl graphs are ..iso phonemicallyequivocal because the (absent) vowels of the wordsthat are responsible for the di'erent meaningsvary from word to word. Therefore, fluent readi-3of " unvoweled" Hebrew requires heavy reliance oncontextual information.

Reading instruction in Hebrew starts, as a rule,with the "voweled* orthographical system inwhich the diacritical marks are presented withthe consonant letters. The vowels are graduallyomitted from school texts starting at thebeginning of the third grade. During the thirdgrade, the children begin to learn to read withoutvowels. By grade four, they are expected to befluent readers of unvoweled texts. Informaldiscussions with teachers, however, revealed thatthe transition from reading voweled to reading

unvoweled material is not equally easy for allchildren. According to teachers, some children aregood readers as long as the diacritical marks arepresent but are slow in acquiring the skill ofreading without the vowels. Because withoutvowels the context of the sentence is a primarysource of phonological constraints on reading, wesuspected that the children in this group,although knowing the grapheme to-phonemetransformation rules, do not (or can not) usecontextual information efficiently. Thus, in spiteof being phonologically skilled, those children maybe poor readers. Thus, Hebrew may be aconvenient medium through which to test thehypothesis that at leas' some deficient readers areless able than good readers to use conte;..t.

We decided to manipulate syntactic rather thansemantic context in the present research. Themain reason for this decision is that syntax isprobably a more basic linguistic ability thansemantics (see Chomsky, 1969) and less affectedby reading experience (Lasnik & Crain, 1985). Inaddition, syntactic violations are more clearlydefined than are manipulations of semanticassociation strength.

The effect of prior syntactic structure on theprocessing of a visually presented target word hasbeen investigated in a number of studies. Lexicaldecisions regarding target words are faster whenthey are preceded by syntactically approrrinteprimes than when preceded by syntacticallyinappropriate primes (Goodman, McClelland, &Gibbs, 1981; Lukatela, Kostid, Feldman, &Turvey, 1983). Lexical decision (West &Stanovitch, 1986; Wright & Garret, 1984) andnaming (West & Stanovitch, 1986) are facilitatedwhen targets are syntactically congruent withpreviously presented sentence fragments relativeto when targets follow a syntactically neutralcontext.

Although the syntactic context effect on wordrecogniti...n is reliable, information on the relationbetween this effect and reading ability iscomparatively scarce and controversial (for areview see Vellutino, 1879). Several studies reportthat poor readers are inferior to normal readers indealing with ,lomplex syntactical structures inspeech (Brittein, 1970; Bryne, 1981; Cromer &Wiener, 1966; Goldman, 1976; Guthrie, 1973;Newcomer & Magee, 1977; Vogel, 1974). Otherauthors, however, have challenged a simpleinterpretation of these results (Glass & Perna,1986). For example, Shankweiler and Crain (1986)have suggested that the poor readers' apparentdeficiency in processing complex syntactic

199

Page 191: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

182 Bentin et al.

information may be an epipiienomenon oflimitations of the working memory processor,rooted in a difficulty in generating phonologicalcodes.

The present study sought to examine therelation between word recognition and syntacticaw-reness; i.e., sensitivity to syntactic structureany . the ability to use syntactic knowledgeexplicitly, in children who vary in readingcompetence. Experiment 1, using a group of adultfluent readers of Hebrew, was designed toestablish the validity of the procedure of testingthe effect of syntactic context on the identificationof auditorily presented words masked by whitenoise. Experiment 2 examined syntactic contexteffects in two groups of children. One group wascomposed of children with learning disordersdrawn from a population of students who wereselected by the school system for specialsupplementary training in reading; a comparisongroup was formed of good readers drawn from thepopulation of fourth-graders of two elementaryschools. In Experiment 3, the same group of goodreaders was compared with a group of pnorreaders from the same elementary schools. Thepoor readers were matched with the good readersfor their ability to apply grapheme-to-phonemetransformation rules in reading voweledpseudowords, but they were significantly inferiorin reading sentences when the words were printedwithout the diacritical vowel marks.

EXPERIMENT 1

The purpose of the present experiment was toestablish the effect of syntactic context on theidentification of auditorily presented words thatwere masked by white noise. The auditorymodality was used to attenuate the deficientreaders' excessive reliance on contextualinformation in reading (which is presumablycaused by the need to compensate for theirdifficulty in decoding the print). Stimulus maskingwas incorporated in the procedure becauseprevious studies suggest that degradationincreases the tendency of all subjects to use4ontextual information for word recognition(Becker & Killion, 1377; Stanovitch & West,1981). We z.."tose to use identification rather thanreaction time measures in order to keep themeasurement as simple and direct as possible.

Subjects were presented with a list of three- orfour-word sentences that were pre-recorded ontape. In each sentence, white noise wassuperimposed on one or several (target) words.The subjects were instructed to identify the

masked words. Half the targets in the list werecongruent with the syntactic structure of thesentence in which they appeared whereas theother targets were incongruent, that is. caused asyntactic violation. We predicted that thepercentage of correctly identified targets would behigher for syntactically congruent words than forsyntactically incongruent words.

MethodSubjects

The subjects were 28 undergraduate students(14 males) who participated in the experiment forcourse credit or for payment. They were all nativespeakers of Hebrew with normal hearing.

Test MaterialsThe auditory test included 10-. three- or four-

word sentences. Each sentence was used in twoforms: a) syntactically correct and b) syntacticallyincorrect. The incorrect sentences wereconstructed by changing the correct sentences inone of the following 10 ways.

Type 1 - In this category there were 12sentences in which the gender compatibilitybetween the subject and the predicate was ali. 'ed.In six of these sentences a masculine subject waspresented with a feminine predicate and in theother six a feminine subject was presented with amasculine predicate. The masked t.L.-get was thepredicate, which was always the last word in thesentence.

Type 2 - In this category there were 12sentences in which the compatibility of numberbetween subject and predicate was altered. In sixof these sentences a singular predicate followed asubject in plural form, and vice-versa in the othersix. The masked target was the predicate whichwas the last word in each sentence.

Type 3 - There were eight sentences in thiscategory. The compatibility of gender and numberbetween the subject and the predicate was alteredin each sentence. The masked target was thepredicate which was the last word in eachsentence.

Type 4 - In Hebrew, prepositions related topersonal pronouns (e.g., "on me") become oneword. The violation in this category consisted ofthe decomposition of the composed pronoun intotwo separate words (the pronoun and thepreposition) which kept the meaning but weresyntactically incorrect. Eight differentprepositions were used, four times each, totaling32 sentences. The masked target words were thecomposed or decomposed pronouns. Because the

Rio

Page 192: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Syntactic Competence and Reading Ability in Children 183

decomposition of the pronoun and prepositionadded one word to the sentence, another word wasexcluded from each of the incorrect sentences.

The syntactical violations of typef; 5 to 10 werebased on changing compulsory order of parts ofthe sentence. Therefore, the sentences of thesetypes were masked from the first to the last word.

Type 5 - In the ten sentences of this type, theorder of the attribute and its nucleus was-eversed.

Type 6 - In each of the six three-word sentencesof this type, the predicate was incorrectlyintroduced between the subject and its attribute.

Type 7 - In Hebrew, the negation always comesbefore the negated predicate. We altered this fixedorder in six sentences, using three differentnegation words.

Type 8 - In six sentences, the interrogativeword was moved from its fixed place at thebeginning of the sentence to the second place.

Type 9 - In six sentences, the fixed order ofpreposition and noun was reversed so that thenoun appeared bef-re the preposition.

Type 10 - In six sentences, the copula thatshould occur between the subject and thepredicate .ias moved to the beginning or the endof the sentence.

All 104 correct and 104 incorrect sentences wererecorded on tape by a female native speaker ofHebrew. Tho tapes were sampled at 20KHz. Themasked interv: Is were marked and white noisewas digitally added to the marked epochs with asignal-to-noise ratio of 1:2.75. This ratio wasdetermined on the basis of pilot tests so thatcorrect target identification level was about 50%.

The 208 sentences were organized into two lists.In each list, 52 sentences were syntacticallycorrect and 52 sentences were syntacticallyincorrect. Each sentence appeared in each list onlyonce, either in correct or incorrect form. Sentencesthat were correct in list A were incorrect in list Band vice versa. Fourteen subjects were tested withlist A and the other 14 with list B. Thus, eachsubject listened to an equal number of correct andincorrect sentences, and across subjects eachsentence appeared an equal number of times ineach form.

The sentences in each list were randomized andre-recorded on tape. Subjects listened to the tapesvia Semmheiser earphones (HD-420).

ProcedureThe subjects were tested individually in a quiet

room. The experimenter listened to the stimulisimultaneously with the subject and stopped thetape-recorder at the end of each sentence. Thesubjects were asked to repeat the masked part ofeach sentence, and were encouraged to guesswhenever necessary. The subjects' responses wererecorded manually by eh: experimenter. Subjectswere randomly assigned to List A or B.

Results and DiscussionThe average percentage of correct responses,

across subjects and sentence trpes was 41.3%.Overall, correct identification was 67.0% for thesyntactically correct sentences but only 15.6% forthe syntactically incorrect sentences. Thesyntactic context effect was evident for each typeof syntactic violation (Table 1).

Table 1. Percentage of correct identification of syntactically correct and syntactically incorrect sentences in eachviolation-type category (see text).

1 2 3Type of syntactic violation

4 S 6 7 8 9 10

Syntacticallycorrect

Mean 60.7 72.6 67.9 832 543 67.9 71.5 69.0 643 58.4

(SEm) 6.2 4A 4.5 1.7 4.6 6.6 7.1 6.9 6.1 10

Syntacticallyincorrect

Mean 18.7 25.6 2.7 20.6 12.2 10.7 143 8.2 19.0 23.7

(SEm) 4.9 5.0 2.6 52 42 6.9 6.6 14 7.6 63

191

Page 193: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

184 Bentin et al.

These observations were confirmed by two-factor analyses of variance with subjects andsentences as random varideles. The factors weresyntactic context (correct, incorrect) and syntacticviolation type (Type 1 to 10). The main effect ofsyntactic context was significant (F(1,26)=588.44,MSe=629, p < .0001 for the subject-analysis andF(1,91)=140.93, MSe=624, p <. 0001 for thestimulus-analysis). The main effect of sentencetype was also significant (F(9,234)=4.5, MSe=404,p < .0001 for the subject-analysis andF(9,91)=2.21, MSe=437, p < .03 for the stimulus-analysis). The context effect was conspicuous forall syntactic violation types, but, as suggested byan interaction between the two factors, itsmagnitude differed. This interaction wassignificant for the subject analysis (F(9,234)=4.73,MSe=322, p < .0001), but only marginal for thestimulus analysis (F(9,91)=1.86, MSe=624, p <.07). Tnkey-A post hoc analysis of the interactionrevealed that the context effect was greater fortype 3 (gender and number), type 4 (compositepronoun), type 8 (translocation of interrogativeword), type 6 (separation of subject and attribute),and type 7 (translocation of negation word) thanfor all the other sentence-types. Within these twogroups, the context effects were similar.

The results of Experiment 1 demonstrate thatidentification of words in sentences is influencedby the syntactic coherence of the st.tence.Because, across subjects, exactly the same wordswere masked and had to be identified in thesyntactically corre t and incorrect sentences, thedifference in the correct identification ratebetween t. e two modes of presentation is probablydue to the manipulation of syntactic coherence.The magnitude of this effect seemed to vary acrossdifferent types of syntactic anomalies but it wasreliable and statistically significant for each type.Because we had neither a priori predictions aboutthe effects of particular violation-types onidentification nor clear post hoc explanations forthe observed differences, and because the type ofviolation is not directly relevant to the issuesinvestigated in this study, the syntactic violation-types will be collapsed in all further analyses.

On the basis of these results, we can use thetechnique of Experiment 1 to assess possibledifferences in the magnitude of the effect in goodand poor readers.

EXPERIMENT 2Experiment 2 comp d the magnitude of the

syntactic cc .text effe, . on the identification ofauaitorily presented words in children who are

good readers and children with a severe readingdisability. As was elaborated in the introduction,although several studies reported that poorreaders are deficient in syntactic comprehension(see Vellutino, 1979), others could not find solidevidence to support this hypothesis (e.g., Glass &Perna, 1986). If disabled readers are less aware ofthe synta....tic structure of the sentence (as part oftheir general linguistic handicap), or do not usesyntactic information as efficiently as goodreaders, syntactic context effects should be weakerin disabled than in good readers. Consequently,the effect of syntactic congruity on correctidentification of sentences should be smaller indisabled than in good readers.

A second prediction concerns the nature oferrors made by good and disabled readers in theidentification of words presented in syntacticallyincorrect sentences. The auditory mask probablyinduces some degree of uncertainty in theauditory :nput. If listeners are aware of thesentence c.ntext, they may attempt to use it tocomplement the information that is missing in theauditory stream. Such a strategy would causeerrors in the identification of words that violatethe syntactical structure because in thosesentences the target does not conform to theexpected syntactic rules. Therefore, errors inidentification that are induced by syntacticawareness should reflect the use of correctsyntactic forms. In an English example, if thesentence -were 1 would like to have many child"and the word "child" was masked, the subject mayerroneously identify the target as "children." Onthe other hand, if the subjects are not aware of thesyntactic structure or not bothered by itsviolations, their errors in the identification ofmasked words should not be related to thesyntactically correct form of the target. In thiscase, the response may be a randomly selectedword, or may relate to the acoustical form forexample, substituting "mild" for 'he target "child"in the above example. If good readers are moreaware of the syntactic structure of the sentencethan disabled readers, the percentage of errors ofthe first type"syntactic corrections," and of thesecond type"random errors"should vary withreading ability. In an extreme case, we should findmore "syntactic correction" errors than "random"in good readers and vice versa in disabled readers.

Analyses of the errors made by each readinggroup would be a first step towards understandingthe cause of inter-group differences in syntacticcontext effects, if they exist. However, in order toassess syntactic awareness as a metalino; ;tic

192

Page 194: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Syntactic Competence and Reading Ability in Children 185

ability rather than automatic use of syntacticstructures for word identification, a more directmeasure had to be employed. In a recent study,Fowler (19813) compared good and poor readers'ability to detect and to correct violations of syntaxin orally presented sentences. In that study, theability to judge sentences as correct or incorrect inthe "judgment" task was not associated withreading ability. In contrast, good readersperformed syntactically better than poor readersin the "correction" task. Fowler concluded thatpoor readers do not differ from good readers insyntactic knowledge but that they may be inferiorin manipulating verbal material in short-termmemory (see also Shanirweiler & Crain, 1986). Weused Fowler's technique to supplement our studyof syntactic context effects on the identification oforally presented words. If, as in Fowler's study, adifference emerges only for the correctioncondition, then syntactic awareness is not at fault.Rather, one would ascribe the differences tosyntactic processing difficulties that prevent thedisabled readers from using their syntacticknowledge productively.

Method

Tests and MaterialsA. Reading Tests. We were interested in

testing two kinds of reading: the ability to decodethe phonology from print and the ability to use thesentence context in reading without vowels.Because all the standard reading tests in Hebrewprimarily test reading comprehension, weconstructed two new reading tests for ourpurposes. The first was a test of decoding ability.It contained a set of 24 meaningless three- or four-letter strings ( pseudowords) presented with vowelmarks. The vowels were chosen according toHebrew morphophonemic rules, and included alllawful combinations. Each pseudoword wasprinted individually on a white, 9 ci.. X 12 cmcardboard. The size of each letter was 0.5 cm. Thesubject was instructed to read each pseudowordexactly as it was written. The accuracy andnaming onset time were measured. The subject'sscore on this test consisted of the percentage ofaccurately read pseudowords and the meanlatency of naming onset time.

The second test was designed to test the abilityto read Hebrew without vowel marks andparticularly to use the sentence context todetermine the reading of unvoweled Hebrewwords the were both phonologically andsemantically ambiguous. This test contained 48

four- or five-word sentences printed on whitecardbord using the same fonts as for thepseudowords in the former test. The last word ineach sentence was the target word. In the absenceof vowel marks, 32 out of the 48 targets werephonologically ambiguous, i.e., they could havebeen assigned at least two sets of vowels to formtwo different words. Thus, correct reading of thosetargets could be determined only by apprehendingthe meaning of the sentence. The 32 ambiguoustargets were 16 pairs of identical letter stringseach representing a different word in therespective sentence. Eight of these 16 ambiguoustargets represented two words of equal frequency.The words represented by each of the remainingambiguous words differed in frequency such thatone member of each pair was a high-frequencyword while the other member was a low-frequencyword. In a previous study Bentin and Frost (1987)reported that when undergraduates werepresented with isolated ambiguous words in anaming task, they tended to choose the mostfrequent phonological alternative. We assumedthat, without context, the children would tend tochoose the same. Therefore, insensitivity to thecontext of the sentence should increase thenumber of errors in reading the targets,particularly when the correct response requiresthe use of the less frequent phonologicalalternative. The remaining 16 targets were wordsthat without the vowel marks could have beenmeaningfully read in only one manner. Eight ofthose sixteen targets were high-frequency wordsand the :Aber eight targets were low-frequencywords. The subjects were instructed to read eachsentence aloud. The time that elapsed from themoment the sentence was cxposed until thesubject finished reading it was measured to thenearest millisecond. The score on this test was theaverage percentage of errors and the average timeto read a sentence.

In addition to these two special purpose tests,each subject was tested for reading comprehensionby a standard test. The nation-wide average scoreon this test for fourth graders is 70% with a SD of12%.

B. Intelligence tests. The IQ of each subjectwas obtained either using the WISC (Full Scale)(whenever those data were available) or testingthe children on the Raven Colored Matrices andtransforming their performance into IQ scores.

C. Syntactic awareness test. Syntacticawareness was assessed by testing identificationof auditorily presented words masked by whitenoise as in Experiment 1. On the basis of a pilot

193

Page 195: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

186 Bentin et al.

study with children, in order to keep the overallcorrect identification of targets around 504O, weincreased the signal-to-noise ratio in Experiment2 from 1:2.75 to 1:2.25.

ProcedureEach child was tested individually in three

sessions. During the first session, readingperformance and intelligence were tested. Readingperformance was recorded on tape for subsequenterror analysis and off-line measuring of time. Atthe end of the test of reading without vowels, theexperimenter verified whether the subjt et knewthe meaning of the targets that had been readincorrectly. In the very few doubtful cases, thesentence was excluded and a substitute sentenceof the same type was given. The children who hadbeen selected for this study (see below) wereinvited to a second session during which theauditory word identification test was given. Theprocedures for the word identification test wereidentical to those of Experiment 1. In addition, atthe end of the second session, thG children weretested for the ability to repeat from memory thesentences presented during the auditory test. Thiswas done by presenting the children with 16sentences selected from the same pool of sentencesfrom which the test set was selected. Eight ofthese 16 sentences were syntactically correct andthe other eight syntactically incorrect. In therepetition test the sentences were presentedwithout the masking noise. Finally, during a thirdsession (three months later), all 104 sentenceswere presented to each child without the maskingnoise. Following the presentation of each sentencethe child was asked whetl ,r *this is the way itshould be said in Hebrew (Judgment Task).Whenever the answer was 'no,' the child wasasked to correct the sentence (Correction Task).

SubjectsThe good readers were 15 children (7 males)

selected from a population of fourth graders of twoelementary schools in Jerusalem. Their agesranged between 8.9 and 9.7 years (mean age 9.3years). The average IQ (FS) score (as assessed bytransforming the Raven score) was 102.5, rangingfrom 85 to 122.5. They were selected to matchpoor readers from the same school on decodingability and IQ. The precise selection criteria willbe elaborated in Experiment 3.

The disabled readers were 19 children f.12males), aged from 9.7 to 14 years, (mean age 11.6),selected from a population of 32 children withsevere reading disorders who had been referredfor special supplementary training in reading.

They were within the normal intelligence range(Mean IQ (FS)=104.83, ranging between 85 and130). The disabled readers selected for the presentstudy were chosen becauae they not only showedpoor decoding ability, as compared to goodreaders, but also performed badly in the test ofreading without vowels, thus suggesting specialproblems in dealing with context. Table 2 presentsthe reading performance of the good and deficientreaders as revealed by our reading tests.

Table 2. Reading performance of the severely disabledand good readers.

Readingvoweled

Nonwords% of Time pererrors item

GoodReaders 8.1% 1.6 sec

DisabledReaders 38.8% 2.6 sec

Readingunvoweled ReadingSentences Comprehension

% of Tune per %errors sentence correct

4.1% 2.9 sec 81.3%

19.6% 6.3 sec 55.7%

Children in both groups were all nativespeakers of iIebrew without known motor,sensory, or emotional disorders. All children hadbeen tested for normal hearing.

ResultsThe overall percentage of correct identification

of masked targets was similar in good (44.2%) anddisabled readers (48.3%) (F(1,32)=2.52, MSe=112,p > .12). However, the percentages of syntacticallycorrect and incorrect sentences that were correctlyidentified were different i.- the two groups (Table3). Although syntactical- zorrect sentences wereidentified better than ,,yntactically incorrectsentences in both groups, the effect of thesyntactic context was smaller in the disabledreaders than in the good readers.

This observation was supported by a mixed-model two-factors analysis of vcriance. Thebetween-subjects factor was reading ability, andthe within-subjects factor was syntactic context(correct, incorrect.). The syntactic context effectwas highly significant across groups(F(1,32)=784.47, MSe=50, p < .0001). A moreinteresting result, however, was the significantinteraction that revealed that the syntactic:onte: effect was greater in good readers than indisabled readers (F(1,32)=11.90, MSe=50, p <.002).

1F4

Page 196: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Syntactic Competence and Reading Ability in Children 187

Post hoc analysis revealed that good anddisabled readers performed equally well withcorrect sentences. However, the percentage ofcorrect identifications of words embedded inincorrect sentences was higher in disabled than ingood readers (p < .01).

Table 3. Percentage of correct identification ofsyntactically correct and incorrect sentences in thereading disabled and good readers.

GoodReaders

DisabledReaders

SyntacticallyconvictMean 71.4 693(SEm) 2.2 19

SyntacticallyIncorrect

Mean 17.0 27.0(SEm) 22

The errors that children made were distributedamong four error types: Type 1 errors were*syntactical corrections," that is, errors that weremade in attempt to use the correct syntacticstructure of a syntactically incorrect sentence.Type 2 errors were "random errors"misidentifications that made no sense whatsoeveror reflected acoustical confusions. Type 3 errorswere logical substitutions," that is, substitutionsof the masked words with othe words that gavethe sentence a logical meaning. Type 4 were "Idon't know" responses, which were not encouragedbut were accepted. The percentage of errors ofeach type (out of the total number of responses) ineach group is presented in Table 4.

Table 4. Percentage of errors of each type (out of thetotal number of responses) made by the disabled andgood readers in the auditory identification task.

Type of errorLogical "I don't

Corrections Random Substitutions know"

ReadingDisabled 39 10.2 21.9 15.7(SEm) 0.4 1.5 1.7 2.2

GoodReaders 62 33 17.4 28.8(SEm) 0.8 0.7 1.7 2.8

Because we had clear predictions only forsyntactic corrections and random errors, weanalyzed the distribution of these two error typesin each group by a mixed-model (reading group Xerror type) analysis of variance. This analysisshowed that, across the two types of error, thegood readers made fewer errors than the disabledreaders (F(1,32)=5.11, =18, p < .031). Acrossgroups, the percentage of errors of each type wassimilar (F(1,32)=3.01, MSe=16, p > .09). Mostinteresting, the interaction between readingability and error type was highly significant(F(1,32):21.54, MSe=16, p < .0001). Post hocanalysis (Tukey-A) revealed that more randomerrors were made by disabled readers than bygood readers, whereas syntactic corrections weremore frequent in good than in disabled readers.All the children were able to repeat verbatim allsixteen sentences that they heard without themasking noise.

Good readers were better than disabled readerson both the judgment and the correction tests.Among the disabled readers, however, a secondarydistinction was evident between four children whowere 13-14 years old and those who were younger.The mean percentage of errors made by eachgroup in each task is presented in Table 5.

Table S. Percentage of errors made by disabled andgood readers in the judgment and correction tasks.

Task"Judgment"

NIM111

"Correction"

GoodReaders 13 SA(SEm) OA 0.9

ReadingDisabled 69 32.1(SEm) 19

Older readingDisabled 0.3 5.6

Because the number of older disabled readerswas too small to form a reliable independent levelin a factorial design but, on the other hand,clearly formed a distinct group, they wereexcluded from the statistical evaluation. Thus, thepercentage of errors in each task was comparedonly for good and disabled readers who were moresimilar in chronological age. As before, a mixed-model analysis of variance was employed wherethe between-subjects factor was reading groupand the within-subject factor was the test

Page 197: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

188 Bentin et al.

(judgment or correction). The analysis of varianceshowed that good readers made significantly fewererrors than a pled readers (F(1,24)=26.58,MSe=127, p < .0001) and that more errors weremade in the correction than in the judgment task(F(1,24)=79.25, MSe=35, p < .0001). A significantinteraction suggested that the task affected thepercentage of errors made by disabled readersmore than it affected the good readers(F(1,24)=41.0, MSe=35, p < .0001). Thisinteraction supports Fowler's results byemphasizing the difference between the judgmentand correction tests. However, in contrast to herresults, post hoc Tukey-A tests revealed that thegood readers made significantly fewer errors thandisabled readers not only in the correction task,but also in the judgment task. The inclusion of thefour older disabled readers in the analysis did notchange the pattern of results, although these fourchildren clearly performed better than the otherdisabled readers.

DiscussiunFor children with a severe reading disability,

the syntactic context effect on the identification ofspoken words was smaller than for good readers.One explanation for the results :night be thatdisabled readers are worse at identifyingauditorily masked words than good readers(Brady, Shankweiler, & Mann, 1983). Such anhypothesis, however, is not supported by thepresent data. If the disabled readers in thepresent study had 'peen handicapped in theidentification of masked words, any manipulationthat increased the difficulty of identifying thewords should have had a greater effect on disabledthan on good readers. Therefore, we should haveobserved a stronger rrther than a weakersyntactic context effect in disabled readers.Further, if masking had a more deleterious effecton identification of words by disabled relative togood readers, the overall identificationperformance in the disabled group should havebeen lower. In fact, the overall correctidentification percentage in disabled readers wasslightly higher than in the good reader group.

A second account of the r ilts might be thatthe smeller syntactic context effect in poor readersreflects a more general problem, such as disordersof short-term or working memory. There is indeedample evidence that disabled readers haveproblems with verbal short-term memory (Mann,Liberman, & She ikweiler, 1980; for a review seeBrady, 1986). Therefore, memory disorders mightexplain why their performance is affected by

sentence context less than that of good readerseven when decoding difficulties are eliminated;they simply do not remember the sentence wellenough. However, a simply reduced short-termmemory span cannot easily account for thepresent results because the children in bothreading groups could accurately repeat sentencessimilar to those used in the identification taskwithout any difficulty. It is still possible, however,that more complex working memory problemscould have contributed to the disabled readers'pattern of performance. We will return to thishypothesis in the General Discussion.

We are left with the most direct hypothesis thatinferior syntactic awareness is the reason for therelatively poor use of syntactic context by thereading disabled children of Experiment 2. Thispossibility is supported by the results of the erroranalysis. The percentage of "syntactic correction"errors made by good readers was almost twice asgreat as that made by disabled readers. 'Syntacticcorrection" errors could have only been madewhen the subject knew what the correct structureof the sentence should have been and expected it.In those circumstances, when the physicalstimulus was degraded the good readers appliedsyntactic rules and misidentified the target. In thesame situation, getting only partial informationfrom degraded stimuli, disabled readers oftenapplied a random guessing strategy disregardingthe sentence context completely. Indeed, thepercentage of "random" errors was three timesgreater in the disabled readers group than in goodreaders.

An additional question examined in the presentexperiment was whether the disabled readers hadmastered the correct syntactic structures !Jut didnot use them properly, or had problems with basicsyntactic knowledge. We examined this questionby testing the ability of both groups to detectviolations of syntactic structure and to correct thedetected violations. The good readers performedbetter than the disabled readers in both tasks.Although the difference between the groups wasgreater for the correction task, disabled readerswere significantly inferior to good readers in thejudgment task as well. ' is latter resultcontradicts the results reposed by Fowler (1988)and suggests that this group of disabled readerswere inferior tl good readers in their awareness ofbasic syntactic structures.

The discrepancy between the present resultsand Fowler's (1988) results, as well as thedisagreement between our conclusion regardingthe syntactic awareness of disabled readers and

1 5 e

Page 198: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Syntactic Competence and Reading Ability in Children 189

previous assertions in the literature that basicphonological disability and deficieni. use ofworking memory mechanisms underlies thesyntactic inferiority observed in poor re,..-3ers (e.g.,Shankweiler & Crain, 1986; Shankweiler et al., inpress), can be explained in two ways. One possibleexplanation is that different types of mechanismsunderly reading deficiencies in differentlanguages. Recall that we selected o,.t deficientreader group to emphasize problems of usingcontext while reading without vowel marks. Indoing so, we may have selected a group of childrenwho were poor in syntactic processing. A secondexplanation is that we have examined childrenwith a reading disability that was considerablymore severe than that of the poor readersexamined by Fowler. Experiment 3 thereforeattempted to generalize the results of the presentexperiment to poor readers selected, as in Fowler'sstudy, from the normal student population ofregular elementary schools.

EXPERIMENT 3Any attempt to generalize about the

characteristics of reading disability cr to predictthe performance of children with readingdisorders is impeded by the heterogeneity or thispopulation. Indeed, reading disorders can appearas the most conspicuous symptom in children whosuffer from atten done! disorders or generallearning 'inability; they can be the main symptom(but rarely the only symptom) of developmentaldyslexia and, at the other extreme, they maycharacterize the performance of otherwise normalstudents who happen to be at the lower end of anormal distribution of reading ability. It ispossible, therefore, that the prior selection ofdifferent types of reading disorders underlies mostdisagreements about this important handicapsn..ong educators an I scientific investigators.

to Experiment 2, the reading disabled childrenwere selected from a population of children withsevere reading disorders. Although they were atleast in the fourth grade, had normal IQ's, andhad no documented neurological symptoms, someof those children could hardly read single wordswith or without vowel marks. We found that theywere inferior to good readers in syntacticknowledge and in using syntactic context to helpidentify spoken words. In Experiment 3, our aimwas to extend these findings to another readergroup. Thus, vie compared good readers withchildren in regular classes who, when formallytested, were inferior to good readers in reading

performance. In particular, we wished to comparethe good readers with a group of relatively poorreaders who were equal to the good readers inbasic decoding e.9tity (as revealed by theirperformance on reacting voweled pseudowords) butwere poorer at reading without vowel marks. Weassumed that the relatively poor readingperformance of th;t: group primarily reflectsinefficient use of the sentence context, andexpected to be able to measure this relativedisability by our auditory test of syntactic contexteffects. In addition, the good and poor readers inthe present experiment were tested for ability todetect and to correct syntactic violations.

MethodSubjects

The subjects were 30 children selected from apopulation of 167 fourth graders in two publicelementary school. The selection was based onperformance on the test of decoding ability andthe test of reading without vowels which weredescribed in Experiment 2. Two reading groupswere assembled. The poor reader group included15 children (9 males); their ages ranged between8.8 and 9.6 years (mean age 9.1 years). Theiraverage IQ (FS) score (as assessed bytransforming the Raven score) was 102.5, rangingfrom 85 to 122.5. Each of the those poor readersmade no more than four errors (16.6%) in the testof decoding voweled pseudowords but at leasttwice as many errors as the good readers whilereading meaningful sentences without vowels. Onthe basis of the assumption that the relativelypoor reading performance of those childrenreflected problems with processing of contextualinformation, we will label this group the "Poorcontext" group. The good readers were the same15 children who were described in Experiment 2.Each child in this group was selected to match onechild in the poor context group on the ability todecade voweled pseudowords and on IQ. However,the good readers performance on the unvoweledsentences was at least twice as good as that of hisor her matched subject in the poor context group.The average scores of the two reading groups onthe reading tests are presented in Table 6.Tests and Materials

The reading tests and the auditory wordidentification test were identical to thosedescribed in Experiment 2. The IQ scores wereestimated by testing the children with the RavenColored Matrices test.

197

Page 199: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

190 Bentin et al.

Table 6. Reading performance of the good and poorcontext readers.

Readingvoweied

Nonwords% of Time pererror item

ReadingunvoweledSentences

% of Time perWON sentence

ReadingComprehension

correct

GoodReaders 8.1% 1.6 sec 4d% 2.9 sec

PoorContext 7.8% 1.7 sec 18.2% 4.7 secReaders

81.3%

72.0%.

11111M

ProcedureThe procedure was similar to that employed in

Experiment 2. The children were tested in twosessions. The first session was dedicated to theselection of subjects for this study. All 167 fourth-graders were tested for reading ability and IQ.During the second session only the selectedchildren were tested on the auditory wordidentification test, and during a third session,their ability to detect and correct the syntacticviolations. During the third session the sentenceswere presented without any masking noise.Sessions one and two were held close to thebeginning of the academic year. Session three wasthree months latter.

ResultsThe percentage of total correct identifications in

the "poor context" group (40.4%) was notsignificantly different from that of the goodreaders (44.2%) (F(1,28)=1.03, NSe =209, p > .31).

The percentages of syntactically correct andincorrect sentences that were correctly identifiedin each group are presented in Table 7.

Table 7. Percentage of correct identification ofsyntactically correct and incorrect sentences in goodand poor context readers.

GoodReaders

Pcar ContextReaders

SyntacticallyCorrectMean 71.4 653(SEm) 2.2 39

SyntacticallyIncorrect

Mean 17.0 15.4(SEm) 2.4 3.6

1

These data were analyzed by a mixed-modelanalysis of variance as in Experiment 2. As before,the syntactic context was highly significant(F(1,28)=505.56, MSe=81, p < .0001). However, incontrast to the findings of Experiment 2, theinteraction between the syntactic context effectand reading group was not significant(F(1,28)=0.96).

As in Experiment 2, the errors made by eachgrap were categorized into four types. Type 1were "syntactical corrections," Type 2 were"random errors," Type 3 were "logicalsubstitutions," and Type 4 were "I don't know."The distribution of errors in each of the tworeading groups (out of the total number ofresponses) is presented in Table 8.

Table 8. Percentage of errors of each type made bygood and poor context readers (ow of the total numberof responses) in the auditory identification task.

Type of errorLogical "I don't

Corrections Random Substitutions know"

Good ReadersMean 6.2 33 17.4 28.8(SEm) 0.8 0.7 1.7 22

Poor ContextReaders

Mean 3.5 6.9 24.5 24.8(SEm) 0.6 1.2 2.9 3.5

Our a priori predictions concer Jed only errors ofType 1 (correction) and Type 2 (random errors). Amixed-model analysis of variance showed nosignificant main effects but a significantinteraction between the type of error and readinggroup (F(1,28)=10.63, MSe=14, p < .003). Post hoccomparisons (Tukey-A) showed that thepercentage of syntactical correction Lrrors washigher in good readers than in the poor contextgroup, whereas the p3rcentage of random errorswas higher in the poor context group than in goodreaders.

The average percentages of errors in thejudgment and correction tasks for each group arepresented in Table 9.

A mixed-model analysis of variance as inExperiment 2 was used to analyze these data.Across groups, there were more errors in thecorrection task than in the judgment test(F(1,24)=39.16, MSe=16, p < .0001). Overall, thegood readers made fewer errors than poor contextreaders (F(1,24)=5.24, MSe=25, p < .035). The

/LS

Page 200: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Syntactic Competence and Reading Ability in Children 191

interaction between the test and the group factorswas significant (F(1,24)=6.32, MSe=16, p < .020).Replicating Fowler's (1988) results, post hocanalysis revealed that good and poor contextreaders did not differ in the judgment test,whereas in the correction test good readers madesignificantly fewer errors than poor contextreaders.

Tsble 9. Percentage of errors made by good and poorcontext readers in the judgment and correction tasks.

Task"Jud ent" "Correction"

GoodReaders

Mean(SEm)

130.4

SA

0.9

Poor ContextReadersMean 1.7 11A(SEm) 0.5 2.3

All children were able to repeat verbatim all the16 sentences presented to them in ab3ence ofmasking white noise.

DiscussionExperiment 3 sought to generalize the results of

Experiment 2 to groups of relatively poor readersselected from the normal distribution of fourthgraders. Unlike in E . eriment 2, the magnitude ofthe syntactic context effect was similar in the goodand in the poor context readers.

The syntactic ability of the children in the poorcontext group was not, however, entirelyequivalent to that of the good readers. Analysis ofthe identification errors made by each grouprevealed that the proportion of errors thatreflected an attempt to correct the syntacticviolation was lower in children who had relativelymore difficulties in reading unvoweled words thanin good readers who were matched with them forphonological decoding ability. In contrast, theproportion of misidentifications that reflectedtotal ignorance of the sentence's context (eithersyntactic or semantic) was lower in good readersthan in the poor context group. This pattern oferrors -night suggest that although wordidentification was similarly affected by syntacticcontext in both reading groups, the good readerswere more aware of the syntactic structure of thesentence tlian were the children in the poorcontext group. The results of the judgment test,

however, did not support this hypothesis. As itturned out, both groups were equally sensitive toviolations of syntactic structures. It is possiblethough, that part of this result reflected a ceilingeffect in that task. The groups differed, however,in their ability to correct those violations.

The common aspect of both the "syntacticalcorrections errors and the test of correctingsyntactic violations is that both measures ref ect,the child's ability to actively generate correctsyntactic structures. This ability is not requiredby the judgment test and may not be reflected inidentification performance. Therefore, tha presentdata suggest that although the good and therelatively poorer readers did not differ in theirsyntactic awarenessthat is, in the sensitivity toand knowledge of basic syntactic structuresthegood readers had a superior ability to use theirsyntactic knowledge, and a tendency to do so.

GENERAL. DISCUSSIONIn the present study we examined the relation

between reading ability and syntactic competenceas it is reflected in the ability to use syntacticcontext for word identification and to detect andcorrect syntactic violations. In contrast to thegreat majority of studies of context effects in goodand poor readers, we used auditory rather thanprinted word identification. Auditory presentationwas used to circumvent a bias that might navebeen induced by the reading disorder itself. Thus,we were better able to assess differences insyntactic processing ability that might relate toreading aelievement. The sensitivity of ourauditory test to syntactic context was nrified byshowing that undergraduate students, fluentreaders of Hebrew, identified t...rget wards maskedby white noise significantly more accurately if thetargets were syntactically congruent with thesentence in which they appeared than if theyviolated the syntactic structure.

A syntactic effect similar to that found inundergraduates was obtained when the same testwas given to fourth graders. However, thedifference between the correct identification ofsyntactically correct and syntactically incorrectsentences was smaller in a group of children witha severe reading disability than in either goodreaders or relatively poor readers selected fromthe normal distribution of fourth-grade students(the poor context group). The good readers and thepoor context group did not differ in the auditoryidentification test.

A second difference between the severelyreading disabled and the children in the poor

199

Page 201: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

context group was observed in the judgment task.Children in the poor context group detectedsentences that contained an error as well as goodreaders. In contrast, the reading disabled wereworse in this test than either the good readers orthe poor context readers.

The relative inferiority of the severely disabledreaders can not be accounted for only by a simplereduction of their short-term memory span. Incontrast to the complex sentences and complexsyntactic structures typically used in otherstudies, we used only very short and simplesentences (three or four words). When formallytested, all the children were able to repeat thesentences verbatim without any problem. Holdinga sentence in working memory for syntacticanalysis probably requires more mental effort andretention of the whole sentence for a longer timethan required by immediate repetition. As waspreviously reported, the factor of delay influencesthe memory ability of poor readers more than thatof good readers (Liberman, Shankweiler,Liberman, Fowler, & Fisher, 1977). However,rather than requiring the manipulation of moresubtle syntactic aspects, the syntactic violationswhich we have used in the present study were, aswe have said, straightforward corruptions of thebasic syntactic relationship between subject andpredicate or a word order that clearly violated thesyntactic structure of the sentence. Therefore, weagree with Byrne (1981) in doubting that deficientuse of verbal memory mechanisms by disabledreaders, at least as this deficiency could berevealed by simple repetition, was a major causefor the deficient use of syntactic context in thepresent study. Instead, we are inclined to believethat the reduced syntactic ability suggested by theperformance of disabled readers reflected agenuine deficiency of linguistic endowment (in thesyntactic and phonological domains) rather thanreduced general cognitive ability or poormetalinguistic insight.

Althougl the syntactic context effect on theidentification task was equal in the good readersand the poor context readers, the syntacticcompetence of these two groups was not entirelyequivalent. In particular, the good readers madesignificantly more syntactical correction errorsthan the poor context readers. The differencebetween the two groups was even moreconspicuous in the correction task. Similar to theresults reported by Fowler (1988) for Americanpoor readers, the ability of poor context children tocorrect syntactic violations was significantlyinferior to that of good readers. This result is

particularly interesting because, as was notedearlier, the syntactic violations used in thepresent study were much simpler and more directthan those used by Fowler. Moreover, our samplesof good and poor context readers were matched fortheir ability to decode and read voweled nonwords.Therefore, just as for the disabled readers, thedifference between the ability of good and poorcontext readers to correct syntactically incorrectsentences cannot be easily accounted for only byassuming differences between the poor and thegood readers in general cognitive skills. Rather,we suggest that, at least for the specificallyselected group of poor readers whose readingerrors reflected reduced ability to analyzecontextual information, both the correction testand the pattern of errors in the identification testsuggest a specific impairment in the ability to usetheir syntactic knowledge in a productive way.

Although both the reading disabled and the poorcontext readers are inferior to good readers insyntactic competence, these two groups differ fromone another. In comparison to good readers, thedisabled readers showed a weaker syntacticcontext effect in the word identification task, aninferior ability to detect syntactical aberrations inspoken sentences, and an inferior ability to correctdetected syntactically incorrect sentences. Thepoor context children were equal to good readersin the syntactic context effect on wordidentification, were equally able to detectsyntactic aberration:, but were inferior to goodreaders in the ability to correct the detectederrors. This pattern of results suggest that thedifferent tasks tap different aspects of syntacticcompetence which might develop at differentrates.

Some insight into the nature of the syntacticdisability reflected by the word identification taskcomes from the observation that the significantinteraction between the reading group and thesyntactic context effect was not caused iy asymmetrical effect of reading group on bothsyntactically correct and incorrect sentences.Rather, it seems that syntactically correctsentences were identified equally well by bothreading disabled and good readers; however, goodreaders were affected more than reading disabledby violations of the syntactic structure ofsentences. A possible interpretation that issupported by these data is that automaticsyntactic processing was equivalent in bothgroups, but that the disabled readers were lessaware of the syntactic structure and did not useidentification strategies that were based upon it.

- 20o

Page 202: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Syntactic Competence and Reading Ability in Children 193

A more definite interpretation obviously requiresa neutral condition in the identification task,which was absent in this study. However, theseresults strongly suggest that the identificationtest is sensitive more to strategic differences andsyntactic awareness than to (automatic) syntacticprocessing.

It is noteworthy to recall in this connection thefour older disabled readers: Although they weresimilar to other disabled readers in the auditoryidentification task, their performance in thejudgment and correction tasks was similar to thatof good readers. These older children may havehad a higher level of syntactic competence so thatwhen their attention was intentionally directed tothe structure of the sentence (as was the case inthe judgment and correction tasks in contrast tothe identification task), the additional knowledgeenabled them to use their syntactic knowledgeproductively.

In conclusion, the results of the present studysuggest that syntactic factors are directly relatedto reading disabilities, at least in Hebrew. Twodistinct populations 3f poor readers have beenidentified. One group was formed of children whoin absence of a better term were labeled readingdisabled. These children were probably able to usebasic syntactic structures, as was evident in theireveryday speech ability and in their identificationof syntactically correct sentences. However, theywere not explicitly aware of the syntacticstructures, and therefore were not inhibited bysemantic incongruity in the identification test;they were less able than good readers to detectsyntactically incorrect sentences, and they wereless able to correct those errors that had beendetected. The second group of poor readers weregood decoders but were relatively weak inanalyzing the context of the sentence in reading.The performance of these children in theidentification judgment tests suggested that theywere aware of basic syntactic structures and coulduse them for perception of speech. However, theywere inferior to good readers in using thosestructures productively as suggested by theirrelatively worse performance in the correctiontest. Thus, our data set limits to previousassertions that poor reading is not related tosyntactical impairment (Gleitmaa & Rozin, 1977;Liberman, 1971; Mattingly, 1972; Shankweiler &Crain, 1986).

Of course, we do not claim to have found acausal relationship between syntactic ability andreading disorders. What we have seen is that, atleast in Hebrew, there are poor readers of normal

intelligence who are good decoders. Theirperformance suggests that there are aspects ofpoor reading that are not accounted for bydeficient phonological processing. Moreover, wehave shown that this impairment is associatedwith deficiencies in linguistic ability, hereexemplified in the syntactic dcmain.

REFERENCESBecker, C. A., & Killion, T. FL (1977). Interaction of visual and

cognitive effects in word recognition. Journal of Experimentalhychobv: Human Perception and Performance, 3, 389-401.

Bentin, S., Frost, R. (1987). Processing lexical ambiguity andvisual word recognition in a deep orthography. Memory &

. Cognition, 15, 13-24.Brady, S. A. (1986). Short-term memory, processing and reading

ability. Annals of Dyslexia, 36,138 -153.Brady, S. A., Shankweiler. D., .te Mann, V. A. (1983). Speech

perception and memory coding in relation to reading ability.Journal of Experimental 010d Psychology, 35, MS-367.

Brittain, M. M. (1970). Wiseacrei performance and early readingachievement Reading Research Quarterly, 6,34-48.

Byrne, B. (1981). Deficient syntactic control in poor leaders: Isaweak phonetic memory code responsible? AppliedPsycholinguistics, 2, 201-212.

Chomsky, N. (1969). Acquisition of syntax in children from S to 10.Cambridge, MA: MIT Press.

Cromer, W., & Wiener, M. (1966). Idiosyncraticresponse patternsamong good and poor readers. Journal of Consulting Psychology,30,1-10.

Fowler, A. E. (1988). Grammaticality judgment and reading skillin grade 2. Annals of Dyslexia,38, 73-94.

Glass, A. I.., & Perna, J. (1986). The role of syntax in readingdisability. Journal of Ism* Disabilities,19,354-359.

Gleitman, L. R., & Rosin, P. (1977). The structure and acquisitionof reading: Relation betvrsen orthography and the structuredlanguage. In A. S. Reber & D Scarborough (Eds.), Towards aPsychology of Reading: The 7roceedings of the CUNY Conference(pp. 1-54). Hillsdale, NJ: Lawrence Erlbaum Associates.

Goldman, S. R. (1976). Reading skill and the minimum distanceprinciple. A comparison of listening and readingcomprehension. Journal of Experimental Chid Psydsology, 22, 123-142.

Goodman, C. O., McCelland, J. L., & Gibbs, R. W. (1981). The roleof syntactic context in word recognition. Memory & Cognition 9,584. 36.

Guthrie, J. T. (1973). Reading comprehension and syntacticresponse in good and poor readers. Journal of EducationalPsychology, 65, 294-299.

Lasnik, FL, & Crain S, (1985). On the acquisition of pronominalreference. Lingua, 65,135-154.

Liberman, I. Y. (1971). Basic research in speech and lateralizatlonof language Some implications for reading disability. Bulletinof the Orton Society, 21, 71-87.

Liberman, I. Y., Shankweiler, D., Liberman, A. M., Fowler, C., &Fishy F. W. (1977). Phonetic segmentation and recording in thebeginning readers. In A. S. Reber Sc D. L. Scarborough (Eds.),Towards a Psychology of Reading: The Proceedings of the CUNYConfer nee (pp. 207-225). Hillsdale, NJ: Erlbeurn Assodates.

Lukatela, G., Kostid, A., Feldman, L. B, Sc Turvey, M. (1983).Grammatical printing of inflected nouns. Memory & Cognition,11,59-63.

Mann, V. A, Liberman, 1. Y., Sc Shanksveller, D. (1980). Children'smemory for sentences and word strings in relation to readingability. Memory & Cognition, 8,329-335.

Massaro, D. W., Jones, R. D., Lipscomb, C, & Scholz, R. (1978).Role of prior knowledge on naming and lexical decisions withgood and poor stimulus information. Journal of ExperimentalPsychology: Human Learning and Memory, 4,498-512.

Mattingly, I. G. (1972). Reading, the linguistic process, andlinguistic awareness. In J. F. Kavanagh & I. G. Mattin;ly (Ede.),

I 201

Page 203: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

194 Bentin el al.

Language by w and by eye (pp. 133-148). Cambridge, MA: MITPrem.

Meyer, D. E., Schvaneveldt, R. W., & Ruddy, M. C. (1975). od ofcontextual effects on word recognition. In P. M. A. Rabbits & S.Dornic (Eds.), Attention and performance V (pp. 98-118). NewYork Academic Pres.

Newcomer, P., & Magee, P. (1977). The performance of learning(reading) disabled children on a test of spoken language. TheReeding Teacher, 30, 8964103.

Perfetti, C. A. (1985). Reading ability. New York: Oxford UniversityPress.

Perfetti, C. A., Goldman, S., & Hogaboem, T. (1979). Reading skilland the identification of words in discourse context. MemoryCognition, 7, V3-282.

Perfetti, C. A., & Roth, S. (1981). Some of the interactive processesin reading and their role in reading skill. In A. Lesgold & C. A.Perfetti (Eds.), Interactive processes in reading (pp. 269-298).Hillsdale, NJ: Lawrence Erlbaum Associates.

Rozin, P., & Cleitman, L. (1977). The structure and acquisition ofreading II: The reading process and the acquisition of thealphabetic principle. In A. S. Reber & D. L Scarborough (Eds.),Tewards a Psychology of Raiding: The Proceedings of the CUNYConference (pp. 55-142). Hillsdale, NJ: Eribeum Assodttes.

Schvaneveldt, R. W., Ackerman, B., & Smiles:, T. (1977). Theeffect of semantic context on children's ward recognition. ChildDevelopment, 48, 612-616.

Schwantes, F. M., Boa, S. L., & Ritz, E C. (1980). Children's useof context in word recognition: A psycholinguistic guessinggame. Child Develciprnent, 51,730:136.

Shartkweiler, D., & Crain S. (1986). Language mechanisms andreading disorders: A modular approach. Cognition, 24,139 -168.

Smith, F. (1971). Understanding reading. New York I loft, Rinehart,and Winston.

Stanovitch, K. E., & West, R. F. (1981). The effect of sentencecontext on ongoing word recognition: Tests of a two processtheory. Journal cof681rental Psychology: Human Perception andPerformance, 7,

Stanovitch, K. E , West, R. F., & Feeman, D. J. (1981). Alongitudinal study of sentence context effects in second-gradechildren: Tests of an interactive compensatory model. ournal ofExperimental Child Psydrology, 32,185 -199.

VeButino, F. R. (1979). Dyslexia: Theory and research. Cambridge,MA: MIT Press.

Vogel, S. A. (1974). Syntactic abilities in normal and dyslexicchildren. Journal of Learning Disabilities, 7,103 -109.

West, R. F., & Stanovitch, K. E . (1978). Automatic contextualfacilitation in readers of three ages. Child Development, 49, 717-727.

West, R. F., & Stanovitch, I(. E. (1986). Robust effects of syntacticstructure on visual word processing. Memory & Cognition, 14,104-112.

Wright, B., & Garret, M. 0984). Lexical decision in sentences:Effects of syntactic structure. Memory & Cognition,12, 31- 45.

FOOTNOTESJournal of Experimental Child Psychology, in press.

*Also Department of Psychology and School of Education, TheHebrew University, Jerusalem, Israel.

"Also Department of Education, University of Connecticut,Storrs.

202

Page 204: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Haskins Laboratories Status Report on Speech Research1989, SR-99 / 100, 195-210

Effect of Emotional Valence in Infant Expressions uponPerceptual Asymmetries in Adult Viewers

Catherine T. 13estt and Heidi Freya Queentt

Research on normal and brain- datiaged adults indicates that the cerebral hemispheresare specialized for emotional as well as cognitive functions. However, controversy remainsover which pattern of cerebral organization best accounts for emotion perception: overallright hemisphere (RH) superiority; RH specialization for negative emotion but lefthemisphere (LH) specialization for positive emotion; RH specialization for negativeemotion but no asymmetry for positive; or RH specialization for avoidance-relatedemotions and LH specialization for approach-related ones. Most studies of normal adultssupport overall RH specialization for emotion perception. However, there is somesuggestion of valence effects in perceptual asymmetries, which may depend onengagement of emotional responses in the viewer. The present research examinedasymmetries in adults' perception of sm:ling and crying infant expressions because,according to ethological theory, infant characteristics elicit heightened emotionalresponses in adults. Results from Experiment 1 supported RH specialization for perceptionof negative expressions, with a lack of asymmetry for positive expressions. Experiments 2and 3 investigated whether this negative-valence effect can soe attributed to differences inthe involvement of LH feature-oriented versus RH holistic approaches to basic informationprocessing, or rather to some other independent, emotion-related specialization of thehemispheres. The results of the latter two experiments were inconsistent with theinformation processing explanation. Discussion concludes with a suggestion that theadult's specialized RH sensitivity to infant crying expressions may have evolved inresponse to selection pressures for rapid response to signal* that indicate potential threatsto infant survival.

Research findings with both unilateral brain-damaged patients and normal adults have led togeneral consensus that the human cerebralhemispheres are differentially involved inemotional, as well as cognitive, processes.However, the exact pattern of hemispneric

This research was supported in part by NTH grant NS.24655and by a Biomedical Research Support Grant from WesleyanUniversity to the first author, as well as by NM grantHD-01994 to Haskins Laboratories.

We thank the following people for their help in completingthe paper and the research described here: ChristineBlackwell, Shama Chaiken, Linda Creem, Angelina Dias,Glenn Feitelson, Sari ICalin, Dare Lee, Roxanne Shelton, AliciaSisk, Leslie Turner and Jennifer X Wilson for their help withstimulus preparation and with collecting and scoring data;Nathan Brody and Michael Stu ddertaennedy for theirthoughtful, constructive comments on earlier drafts of themanuscript; and Katherine Hildebrandt (now Karraker) formaking her infant photographs and emotional rating dataavailable to us.

involvement in emotions remains controversial.According to the most widely-held view, the righthemisphere (RH) dominates overall in theperception and expression of emotion, across bothnegative and positive emotional valence (e.g.,Campbell, 1978; Chaurasia & Goswami, 1975;Gainotti, 1972, 1988; Hirschman & Safer, 1982;Ladavas, Umilta & Ricci-Bitti, 1980; Ley &Bryden, 1979, 1981; Safer, 1981; Strauss &Moscovitch, 19'1). For convenience, that view willbe referred to here as the RH hypothesis. Themajor counter-proposal has been that the righthemisphere predominates in perception andexpression of negative emotions, the left inpositive emotions, a view we will refer to as thevalence hypothesis (e.g., Ahern & Schwartz, 1979;Dimond & Farring ...)n, 1977; Natale, Gur & Gur,1983; Reuter-Lorenz & Davidson, 1981; Reuter-Lorenz, Givis & Moscovitch, 183; Rossi &Rosadini, 1967; Sackeim, Greenberg, Weiman,

195,. . Os 203

Page 205: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

196 Best and Queen

Our, Hungerbuhler, & Geschwind, 1982;Silberman & Weingartner, 1986; Terzian, 1964).Several variations on the valence hypothesis havealso been offered. Some evidence suggests thatwhile negative emotions show differential righthemisphere involvement, there may be lesshemispheric asymmetry for positive emotions(e.g., Dimond, Farrington & Johnson, 1976;Ehrlichman, 1988; Sackeim & Gur, 1978, 1980);we will call this view the negative-valencehypothesis. Another possibility is that thedifferential involvement of the left and righthemispheres in emotions may depend on themotivational qualities of approach versusavoidance, respectively, rather than on thepositive versus negative valence cre the emotionper se (e.g., Kinsbourne, 1978). According toDavidson and colleagues, such an approach-avoidance distinction between hemispherespertains only to the subject's emotional experience(internal feeling-state) and expression (mediatedby frontal 1, 'ies), but not to perception of emotions(parietal lobes), which show an overall righthemisphere superiority (Davidson, 1984; Davidson& Fox, 1982; Davidson, Schwartz, Saron, Bennett,& Go lemon, 1979; Fox & Davidson, 1986, 1987,1988). For brevity, we will call the latter proposalthe motivational hypothesis.

This report focuses on asymmetries in normaladults for perception of infant facial emotionalexpressions. The majority of findings onperception of adult facial expressions by normal,neurologically-intact subjects have favored the RHhypothesis (e.g., Brody, Goodman, Halm,Krinzman & Sebrechts, 1987; Bryden, 1982;Bryden & Ley, 1983; Campbell, 1978; Carlson &Harris, 1985; Gage & Safer, 1985; Heller & Levy,1981; Hirschman & Safer, 1982; Levy, Heller,Banicn, & Burton, 1983; Ley & Bryden, 1979,1981; Moscovitch, 1983; Safer, 1981; Segalowitz,1985; Strauss & Moscovitch, 1981). These studies

I have typically found a left visual field (LVF)advantage, implying RH superiority, in perceptionof both positive and negative emotionalexpressions.

Only a few perceptual asymmetry studies havesupported the valence hypothesis or its variants.In favor of the valence hypothesis, adults ratetachistoscopically-presented facial expressionsmore negatively when they are presented to theLVF-RH, but rate them more positively in theright visual field (RVF)-left hemisphere (LH),although the RH is better overe'l atdifferentiating among categories of emotion(Natale et al., 1983). Similarly, when subjects

must identify the visual field containing anemotional expression during simultaneoustachistoscopic presentations of an emotionalexpression in one visual field an a neutralexpression in the opposite visual field, they detectnegative expressions more rapidly in the LVF-RHbut detect positive expressions more rapidly in theRVF-LH (Reuter-Lorenz & Davidson, 1981;Reuter-Lorenz et al., 1983). The motivationalhypothesis is supported by research on EEGresponses in subjects viewing films of negativeversus positive emotional expressions. Both adults(Davidson et al., 1979) and infants (Davidson &Fox, 1982) showed greater electrocorticalactivation of the right frontal lobe while viewingemotionally negative films, but greater activationof the left frontal lobe during positive films.However, parietal lobe activation was greater onthe right than the left side at both ages for bothtypes of film. No studies on perception of facialexpressions support the negative-valencehypothesis. However, when emotionally negativefilms have been presented to a single hemisphereby having subjects wear half-silvered contactlenses (Dimond et al., 1976), or when emotionallynegative odorants have been restricted to a singlehemisphere via presentation to the ipsilateralnostril (Ehrlichman, 1988), subjects rated thestimuli presented to the RH as more intenselynegative, without showing significantasymmetries for rating emotionally positivestimuli.

Why are there such inconsistent findings acrossstudies of cerebral asymmetries in normal adults'perception of facial expressions? To some extent,they may be explained by variations inmethodology and task requirements. Studiessupporting the RH hypothesis have typicallyrequired subjects to recognize or discriminatefacial expressions. In contrast, the studiesfavoring variants of the valence hypothesis havecalled for judgments about the emotionality of thestimuli. Recognition and discriminationjudgments call upon basic cognitive-perceptualskills, and may be carried out by so-called "cold"cognitive abilities, whereas judgments aboutemotionality may more directly require the viewerto tap into emotional processes. This observationraises the related possibility that the presence orabsence of a valence effect in perception maydepend on the viewer's own emotional response tothe stimuli, es Davidson (1984) and Ehrlichman(1988) have suggested. The viewer's emotionalresponse, in turn, could very likely be influencedby whether the emotions represented in the

204

Page 206: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Asynonetries in Perception of infant Expressions 197

stimulus photographs are genuine or spontaneousversus 0:Imitated or posed. Clinical evidenceindicates that productions of spontaneous andposed expressions are mediated by differentneural pathways; the former is disturbed bydamage to the temporal lobes or extrapyramidalsystem, while the latter is disturbed by frontal orpyramidal damage (e.g., Monrad-Krohn, 1924;Remillard, Anderman, Rhi-Sausi & Robbins, 1917;RUM, 1984). These two classes of expression couldtherefore be expected to provide differentinformation to the viewer about tb, emitter'sinternal emotional state. ers wouldpresumably be more likely to have an emotionalresponse themselves while viewing genuineemotional expressions than they would whileviewing simulated expressions. In this context, itis important to note that nearly all the data onasymmetries in perception of facial expressionshave been obtained with stimuli containing posedrather than spontaneous emotional expressions.

For these reasons, we conducted a series ofexperiments requiring judgments about theemotionality of facial expression stimuli thatshould be more likely to elicit emotional responsesin the viewer. Photographs of smiling and cryinginfants were chosen as the stimulus materials,based on several considerations. First, infantexpressions are certainly more spontaneous andgenuine, thus providing a more direct window onthe infant's actual affective state, than are mostadult expressions, especially those facialexpressions that occur in social situations. Even inthe case of so-called spontaneous expression., inadults, the facial display is often influenced tosome extent by the flrces of social conditioningand cultural display eules (Buck, 1986; Ekman,1972), which would have much less or no influenceon infants' expressions (e.g., Campos, Barrett,Lamb, Goldsmith, & Stenberg, 1983; Rothbart &Posner, 1985). It is generally assumed that infantsdo not begin to simulate, mask, or deliberatelycontrol their facial expressions until the secondyear of life (e.g., Campos et al., 1983; Oster &Ekman, 1978; Rothbart & Posner, 1985; Sroufe,1979; cf. Fox & Davidson, 1988). Second, whereasadult expressions often involve complex mixturesof emotions, infant expressions tend to be simpler,reflecting purer examples of the basic categories ofemotion (Campos et al., 1983; Izard, 1979; Izard,Huebner, Risser, McGinnes, & Dougherty, 1980).This characteristic of infant expressions may elicitsimpler, more straightforward emotionalresponses from viewers than do more conplexadult expressions. Third, ethological theory and

research argue that infant expressions andappearance strongly tend to elicit emotionalresponses from adults, and do so to a greaterextent than do the vxpressions and appearance of(unfamiliar) adults (e.g., Bowlby, 1969; Eibl-Eibesfeldt, 1975; Lorenz, 1935, 1981; Lorenz &Leyhausen, 1973). Adults' emotional responses toinfant signals, including infant emotionalexpressions, are part of a mutually adaptedsystem of evolved behaviors that promote thedevelopment of the relatively helpless humaninfant, which thus fosters the reproductive fitnessof individuals displaying these characteristics,and ultimately the survival of the species. Theseresponses to infants are, of course, particularlystrong in their caregivers, but are present in allhumans.

Infant crying and smiling are of particularinterest to the present research when consideredin ethological terms. Both serve to promotephysical proximity between the infant and thecaregiver, although for different reason:, (e.g.,Bowlby, 1969; Campos et al. 1983; Emde,Gaensbauer & Harmon, 1976). Smiling indicates apositive affective state and emotional approach ofthe infant toward the adult with which it isinteracting. Infant smiling typically also elicitspositive feelings and a corresponding approachresponse from that adult. The motivationaltendencies associated with infant crying, however,differ for the infant and the responding adult. Theinfant': distress indicates negative feelingsassociated with some noxious stimulation orsituation, and therefore a tendency for withdrawalin the infant. However, given an infant whocannot actively, physically withdraw itself, thecrying typically elicits negative or distress feelingsin nearby adults, which usually leads them(particularly caregivers) to attempt to aid theinfant by eliminating the source of the distress.Thus, approach behavior is elicited in the adult.Based on this analysis of adults' respc.ises toinfant smiles and cries, the four hypothesesoutlined above regarding cerebral organization foremotional processes would each predict a differentpattern of asymmetries in adults' perception ofinfant emotional expressions. The RH hypothesiswould predict an overall LVF advantage that isunaffected by the valence of the emotionexpression. The valence hypothesis would predicta LVF-RH advantage for perception of infantcrying expressions, but a RVF-LH advantage forperception of infant smiles. T1.9 negative-valencehypothesis would instead predict a LVF-3Hadvantage for crying expressions but a smaller or

205

Page 207: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

198 Best and Queen

nonexistent bias for perception of smiles. Finally,the motivational hypothesis would most likelypredict a RVF-LH advantage for perception ofboth infant cries and smiles, because both shouldelicit approach responses in adult viewers. TheRVF-LH bias might be larger for smiles than forcries, to the extant that smiles might elicit astronger approach tendency when the adultviewers are not caregivers of the infants depicted.

Experiment 1 investigated these possibilities,using a set of photographs of infants' smiling andcrying expressions. For this purpose, we employe(the free-field viewing procedure developed byLevy, Heller, Banich and Burton (1983), in whichsubjects must choose which member of a pair ofmixed-expression chimeras for each of a nu.aber ofposers appears to be emotionally more intense(happier or sadder). Each pair of chimerasdisplays a half-neutral, half-emotional facialexpression of a given poser; in one chimera, theemotional expression appears on the left side ofthe photograph, while in the other it appears onthe right side. Significant asymmetrical biases inthe viewers' choices between these pairs ofchimeras are interpreted as reflectingasymmetrical activation of the cerebralhemispheres in response to the task, followingKinsbourrAe's (1978) proposal that activation of aone cerebral hemisphere will cause an attentionalbias (increased perceptual sensitivity) favoring thecontralateral spatial hemifield. Thus, if chimeraswith the emotional expression on the left side ofthe photograph are perceived to be moreemotional (happier or sadder) than those with theemotional expression on the right side, a LVF biaswould be indicated, implying greater activation ofthe RH during the emotion judgment task. Theconverse pattern would indicate a RVF-LH bias(cf. Grega, Sackeim, Sanchez, Cohen & Hough,1988). Levy et al. (1983) found a LVF-RH bias inadults' perception of half-neutral, half-smilingchimeras of adult posers' faces.

Experiments 2 and 3 were designed to examinewhether thi. valence effect that was found inExperiment 1 could be attributed to differences inthe extent to which judgments about smilingversus crying ,expressions involve the basicinformation processing approaches of the rightversus the left hemisphere. The left hemispherehas been characterized as having a feature-orieuted, analytical enproach to informationprocessing, whereas the right hemisphereapproach has been described as Lolistic, gestalt-like, or synthetic (e.g., Bradshaw & Nettleton,1981; Bryden, 1982; Levy, 1974). An effect of

valence on asymmetries in the perception ofemotional expressions might result from greaterinvolvement of LH .,nature- oriented processing forone category of expression than for the other (cf.Moscovitch, 1983). For example, smiles may beperceived by simply focusing on whether thecorners of the mouth are upturned or not, butperception of crying expressions may require theperceiver to attend to the overall configuration ofmouth, eyes and brows. If so, then manipulatingthe stimulus properties to emphasize a focus onspecific features should shift the degree anddirection of visual field bias for judgments aboutnegative and positive expressions. Alternatively,the valence effect could reflect differences inhemispheric specialization for positive andnegative emotions per se, independent of the basicinformation processing skills of the twohemispheres. In the latter case, feature-orientedstimulus manipulations would not be expected toinfluence the valence effect on perceptualasymmetries. These possibilities were explored inthe last two experiments reported here.

EXPERIMENT 1

MethodSubjects

Forty-six university students (23 female, 23male) were included in this study; all had alsoparticipated in a related study of asymmetries ininfants' facial expressions (Best & Queen, 1989).Al were familial right-handers, a population thatis more consistently and more strongly later-alized than non-right-handers for varioushemispherically-specialized functions, includingthe perception of emotional expressions(Chaurasia & Goswami, 1975; Heller & Levy,1981). Subjects completed a handedness checklistthat assessed degree of hand preference on 10common unimanual activities, including writing,as well as for the writing hand preference ofimmediate family members. To be consideredstrongly right-handed, subjects had to indicate a'strong' to 'moderate* right-hand preference forall items, without ever switching hand preferenceduring childhood, and both of their parents had tobe right-handed. Four additional subjects werealso tested but later eliminated for failure to meetthe handedness criteria. All subjects had normalor corrected vision. They received $4.00 for theirparticipation in the 40 minute test session.

StimuliThe stimulus materials were generated fromphotographs of facial expressions by 10 normal,

J 6

Page 208: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Mymme ries in Perception essions 199

full-term 7- to 13-month-old infants, which wereoriginally taken by a rxtrait photographer for aseries of studies on infant attractiveness(Hildebrandt & Fitzgerald, 1978, 1979, 1981).They were the same original photographs as thoseused in the Best and Queen (1989) study. Eachinfant provided a neutral facial expression andeither a clearly negative (i.e., crying) or a clearlypositive (i.e., smiling expression, according toratings obtained in an independent study(Hildebrandt, 1983). Four infants had cryingexpressions; the other 6 had smiling expressions.All photographs were of full-frontal facial views.

Two black-and-white 5 x 7 inch prints weremade of each infant's neutral expression, alongwith two prints of the infant's smiling or cryingexpression. For each pair, one photograph wasprinted in normal orientation, and the other inmirror-reversed orientation. These were used toconstruct four mixed-expression chimeras for eachinfant (see Heller & Levy, 1981). Each print wascut down the exact facial midline, defined as theline connecting the point midway between theinternal canthi of the eyes and the point in thecenter of the philtrum just above the upper lip.The two normal orientation chimeras for a giveninfant were made by joining the left half of thenormal orientation print of the emotionalexpression (smile or cry) with the right half of thenormal orientation neutral expression, and byjoining the right half of the emotional expressionwith the left half of the neutral expression. Foreach chimera, the midlines of the hemifaces werealigned at the eyes and nose (the mouths oftencould not be exactly aligned because of differingdegrees of mouth opening; see also Heller & Levy,1981), and glued to a backing sheet. The mirror-reversed chimeras were constructed in likemanner from the mirror-reversed prints of theemotional and neutral expressions. Comparison ofresults with the normal orientation chimerasversus the mirror-reversed chimeras allowed forassessment of any influence that infant expressiveasymmetries might have upon the adults'responses.

Before reproduction, each chimera was centeredbehind an oval-shaped mattboard opening the sizeof the average photographed face, in order toscreen out variations in facial outline and hairamong the infants. Copies were made on a high-quality Kodak photocopier, using a gray-scalecorrection template that produces good resolutionof photographic images. Each page containedeither the pair of normal orientation chimeras fora given infant, or the pair of mirror-reversed

chimeras for an infant, appearing one above theother. We used above-below rather than side-by-side pairings to avoid having subjects' choices ofthe more emotional chimera be confounded byhemispatial field biases resulting from theasymmetrical hemispheric activation that wouldbe expected for such a laterally-specializedfunction (Kinsbourne, 1978). Each of the 10infants was represented on four pages: 1) anormal-orientation pair in which the emotionalexpression was on the right side in the top picture;2) a normal-orientation pair in which theemotional expression was on the left side in thetop picture; 3) and 4) the pairs of mirror-reversedchimeras positioned as in items 1 and 2,respectively. Thus, there were 40 pages of pairedchimeras. Test booklets were constructed withthese pages ordered pseudorandomly, such thatthere were no more than 3 consecutive smilinginfants or 3 consecutive crying infants, and noconsecutive presentations of the same poser. Atthe top of each page one of the following questionswas printed: "Which infant looks happier?' (forsmiling-neutral chimeras) or "Which infant lookssadder?' (for crying-neutral chimeras).

ProcedureSubjects were tested in groups of 5-15 in a quiet

room. They sat at separate desks and each had acopy of the test booklet, with a cover sheet ofinstructions. They circled 'TOP" or 'BOTTOM* oneach numbered line (1.40) of an answer sheet toindicate which member of each pair of chimeraslooked happier or sadder. Subjects proceededthrough the booklet at their own pace, one page ata time without turning back or directly comparingPages.

ResultsThe data were converted to laterality ratios

according to the formula (R-L)/(R+L), in which R =percent of chimera choices with the emotionalexpression on the right side of the picture (i.e.,RVF preference), L = percent choices with theemotional expression on the left side (LVFpreference), and R+L = 100%.1 The lateralityratios thus range from -1.0 (extreme LVF bias) to+1.0 (extreme RVF bias). These values were thenentered into a 2 x 2 analysis of variance (ANOVA),with the factors of emotion (cry, smile) andorientation of the photographs use:: in thechimera (normal. mirror-reversed). To determinewhether the *dean laterality ratios for each cell,and overall, showed a significant perceptualasymmetry (i.e., differed significantly from a score

267

Page 209: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

200 Best and Queen

of 0 laterality), Meats were conducted. The alphalevel correction for multiple t -testa (a =.05/number of t- tests) required a significance levelofp < .007 for tho latter tests.

There was a significant LVF bias overall (seeTable 1 for all mean laterality ratios and Mesta).

Table 1. Laterality ratios for the perception of mixed-expression chimeras of smiling and crying infants innormal and mirror-reversed orientation, Experiment I.

SummaaStatisics_

t valuebLaterality

Ratios

EffectsOverall perceptual Nu -.13 .478 .003

Emotion effectSmile -.07 -2.57 iaQy -,19 -4.84 .0000

Orientation effectNormal orientation -.57 -17.84 1.000Mirror - reversed +31 802 .0000

Emotion x Orientation interactionSmile

Normal orientation -.72 -22.07 .0000

Mirror-reversed +.59 15.22 .000n

elYNormal orientation -,42 -7.84 .0000

Mirror-reversed +.03 .54 As

&Computed as [R-LMR+LJ, where R - percent choices withemotional expression on right of chimera, L - nacentchoices with emotional expression on left, and R + L z100%. Negative scores indicate a left visual field bias,positive scores a right field bias.

bOne-sample t -tests of whether the mean laterality ratio wassignificantly greater than 0, indicating a significantperceptual asymmetry-

However, a significant main effect of emotion,F(1,45) = 10.09, p < .003, indicated that thevalence of the infant expressions influenced thedegree of asymmetry in the adults' perception ofthe emotionality of he chimeras. Specifically, theLVF bias was significant for judgments of cryinginfants, but not for smiling infants. In addition,the orientation effect, F(1,45) = 366.68,p = .0000,revealed that the adult subjects' perceptual biaseswere quite sensitive to asymmetries in the infants'expressions themselves. The normal orientationchimeras, in which the infants' more expressiveright hemiface (Best & Queen, 1989) appeared inthe viewers' LVF, yielded a significant LVF bias.

In contrast, the mirror-reversed chimeras, inwhich the infants' right hemiface appeared in theRVF, yielded a smaller but significant RVF bias.Finally, there was a significant Emotion xOrientation interaction, F(1,45) = 66.80, p = .0000(see Table 1). For the smiling infants, there was alarge difference in laterality ratios between thenormal orientation chimeras, which showed astrong LVF bias, and the mirror-reversedchimeras, which showed a strong RVF bias. Theviewers' perceptual asymmetries were lessstrongly influenced by orientation of the cryinginfant photos, showing a more moderate LVF biasfor normal orientation chimeras and a very small,nonsignificant RVF bias for mirror-reversedchimeras. Simple effects tests of the interactionindicated that the orientation effect wassignificant, nonetheless, for both the cryingexpressions, F(1,45) = 28.25, p = .0000, and thesmiling expressions, F(1,45) = 703.32, p = .0000.Furthermore, the emotion effect was significantfor both normal orientation chimeras, F(1,45) =24.14, p = .0000, and mirror-reversed chimeras,F(1,45) = 63.29, p = .0000.

DISCUSSIONConsistent with the hypothesis that cerebral

asymmetries in emotional processes areinfluenced by emotional valence, the viewers inExperiment 1 showed a significant LVF bias inperception of negative but not positive infantemotional expressions. These results arecompatible with the negative-valence hypothesisof cerebral organization for emotional processes(e.g., Ehrlichman, 1988). They do not as stronglysupport the valence hypothesis (e.g., Silberman &Weingartner, 1986; Tucker, 1981), because theLVF-RH bias in perception of negative infantexpressions was not complemented by a RVF-LHbias in perception of positive expressions. Theresults also fail to support the motivationalhypothesis of emotion lateralization (e.g.,Davidson, 1984) in that infa..c cries and smiles didnot yield a RVF-LH advantage, although bothtypes of infant expression would be expected toelicit approach responses from adult viewers. Theresults also stand in contrast to the overwhelmingmajority of studies favoring the RH hypothesis,which have found a LVF advantage in adults'perception of adult facial expressions that isunaffected by valence (e.g., Bryden, 1982; Bryden& Ley, 1983). Only three studies on the perceptionof adult facial expressions have found a valenceeffect in neurologically intact adults, and theseboth involved tachistoscw_lic presentations

1

I

Page 210: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Asymmetries in Perception of Infant Expressions 201.

(Nate le, Gur, & Gur, 1983; Reuter-Lorenz &Davidson, 1981; Reuter-Lorenz, Givis &Moscovitch, 1983). Thus it is remarkable that theperception of infant expressions showed aconsistent valence effect even with the free-fieldtask used here.

It was suggested earlier that a valence effectshould be optimized by the heightened emotionalresponse that infant expressions elicit from adultsaccording to ethological theory. Indeed, informalobservations revealed that many of the subjectssmiled or showed other positive emotionalresponses to the infant faces during the test,whereas none of them sho wed such responseswhile completing a similar test with chimericadult expressions during the same session. Thesuggestion of heightened sensitivity to infantemotional expressions is further corroborated bythe finding that the perceptual responses of ouradult subjects were strongly influenced by theinfants' own expressive asymmetries. In ceitrast,Levy et al. (1983) failed to find significantevidence of perceivers' sensitivity to thee ,ressive asymmetries of their adult posers in asimilar free-field study using adult chimericexpressions.2 In the present experiment, theviewers' sensitivity to asymmetrical informationin the infant expressions was great enough toreverse their perceptual field asymmetries from aLVF bias when the more expressive righthemiface of the infants was on the left side of thechimera, to a smaller but significant RVF biaswhen the infants' right hemiface was on the rightside of the chimera.

The interaction between the valence of theinfants' emotional expressions and the orientationof the chimeras indicates that the relative impactof the viewers' perceptual biases and the infants'expressive asymmetries differed betweenjudgments of negative and positive expressions.The pattern of this interaction provides additionalsupport for the negative-valence effect. Althoughthe left/right position of the infants' moreexpressive right hemiface within the chimerasinfluenced perception of both types of expressions,it was a much weaker determinant of judgmentsabout cries than about smiles. That it infantexpressive asymmetries had -elatively lessinfluence, and the viewers' LVF-RH bias relativelymore influence, in the adults' responses to criesthan to smiles.

Thus, Experiment 1 provided clear evidence ofanegative-valence effect on asymmetries in adults'perception of infant emotional expressions.However, it did not elucidate the underlying

perceptual processes responsible for thephenomenon. As suggested in the introduction,one possible source of the effect might be thatnegative expressions are perceived in terms of theconfiguration of the whole face (i.e., the gestalt ofthe features within the 'frame of a face outlineand hair), whereas perception of positiveexpressions may focus upon the mouth as a singledistinguishing feature (Moscovitch, 1983). Theformer approach would call more heavily upon theholistic abilities of the right hemisphere, while thelatter approach would be more suited to thefeature-oriented analytic abilities of the lefthemisphere (e.g., Bradshaw & Nettleton, 1981;Levy, 1974). If the influence of emotional valenceis attributable to such differences 'al theperceptual approach to crying and smilingexpressions, then the negative-valence effect, andindeed the overall LVF bias, should becomeattenuated when the viewers' attention isprogressively restricted to narrower sources ofemotional information in the infants' faces, suchas the patterning of the central facial featureswithout the contextual *frame" of thv facialoutline, hair, and other peripheral details. Thismanipulation should lead subjects to use a morefeature - oriented, analytic approach, and thus torely more heavily on left hemisphere informationprocessing strategies. Alternatively, it may be dr.the viewers' actual emotional responses to cryingand smiling infants, rather than the informationprocessing strategy, are responsible for thevalence effect. If so, then the negative-valenceeffect and the overall perceptual field bias shouldappear even when the viewer's attention isfocused away from the gestalt of the faces andtoward subcomponents or specific 1.'eatures. Thenext two experiments were designed tosystematically examine these possibilities.

EXPERIMENT 2If the holistic or gestaltlike perceptual

specialization attributed to the right hemisphereis responsible for the finding of a LVF for cryingbut not smiling expressions, we would expect thatperception e cries focuses on the gestalt of thewhole face, i.e., the patterning of facial featureswithin the configurational *frame of the face. Ifthe crying expressions evoked a greater degree ofright hemisphere involvement because they wereperceived in a more holistic manner, then removalof the peripheral configurational information suchthe facial outline should attenuate or eliminatethe negative-valence effect. If, on the other hand,the valence effect derives from the emotional

209

Page 211: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

202 Best and Queen

nature of the response to the infant expressions,then it should persist even for judgments of thecentral facial features alone.

To restrict the viewers' attention to thepatterning of the central features of theeyes/brows, mouth and nose, we deleted theunwanted peripheral details (e.g., face outline,hair, cheeks) from optically-digitized versions ofthe original photographs via computer. A newgroup of subjects made choices between each pairof the mixed-expression chimeras generated fromthese computer-edited infant expressions, as inExperiment 1.

MethodSubjects

Ninety-six familial right-handed university andhigh school students (51 female, 45 male)participated in Experiment 2. All had normal orcorrected vision, and all had participated in the

o

._...2mstm1.- -7.!.!

. -

SMILE

Best and Queen (1989) study. The universitystudents received $4.00 for their participation inthe 45 minute session; the high school studentswere unpaid volunteers.

StimuliHigh-quality photocopies of the original

photographs from Experiment 1 were computer-digitized and edited, using an Apple® Macintosh"'512+ computer (see Best and Queen, 1989, fordetails). The cheeks, ears, chin, hair, and faceoutline were removed from the digitized pictures,and the resulting edited images were printed inboth normal and mirror-reversed orientation.These were then used to generate mixed-expression chimeras of each infant (see examples,Figure 1), which were assembled into_a 40-pagetest booklet and reproduced with a high-qualityphotocopier, as in Experiment 1.

CRY

Figure 1. Examples of digitized mixed-expression chimeras of a smiling and a crying infant, with the emotionalexpressions on the left versus the right side of the picture, Experiment 2.

210

Page 212: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Asymmetries in Perception of Infant Expressions 203

ProcedureSubjects completed the test booklet under the

same conditions and instructions as inExperiment 1.

ResultsThe data were transformed to laterality ratios

and analyzed as in Experiment 1. The significancelevel for the multiple t-tests was set at p < = .007,as before.

Again, there was a significant overall LVF biasin perception of the emotional chimeras (see Table2 for mean laterality ratios and t-tests). Themagnitude of this LVF bias did not changesignificantly from that found in Experiment 1. Theemotion effect was significant, as in Experiment 1,F(1,95) = 7.76, p < .007, again indicating astronger LVF bias for crying expressions than forsmiling expressions. The magnitude of thedifference in visual field biases between the cryingand the smiling infants did not differ significantlyfrom those found in Experiment 1, according tot-test. The orientation effect was also significant,F(1,95) = 432.01, p = .0000, indicating that the

infant's expressive asymmetries affected theviewers' judgments. There was a significant LVFbias when the infants' more expressive righthemiface was on the left side of the chimeras, buta RVF bias when it was on the right side. TheEmotion x Orientation interaction was alsosignificant, F(1,95) = 63.64, p = .0000, followingthe pattern found in Experiment 1. For thesmiling infants, the normal orientation chimerasshowed a strong LVF bias, and the mirror-reversed chimeras showed a strong RVF bias (seeTable 2). The orientation of the crying infantphotos showed a similar but weaker influence,yielding a moderate LVF bias for normalorientation chimeras, and a much smaller RVFbias for mirror-reversed chimeras. According tosimple effects tests of this interaction,nevertheless, the orientation effect was significantfor both the crying expressions, F(1,95) = 13.23, p< .0005, and the smiling expressions, F(1,95) =65.76, p = .0000, and the emotion effect wassignificant for both normal orientation chimeras,F(1,95) = 559.26, p = .0000, and mirror-reversedchimeras, F(1,95) = 104.62,p = .0000.

Table 2. Laterality ratios for the perception of mixed-expression chimeras of smiling andcrying infants in normal and mirror-reversed orientation, Experiment 2.

Summary Statistics

LateralityRation t valueb

EffectsOverall perceptual bias -.11 -523 .0000Emotion effect

Smile -.07 -3.1C .003Qy -.15 -5.62 .0000

Orientation effectNormal orientation -.49 -19.13 .0000Mirror-reversed +.26 8.83 .0000

Emotion x Orientation interactionSmile

Normal orientation -36 -19.66 .0000Mirror-reversed +.42 12.34 .0000

QyNormal orientation -.41 -11.08 .0000Mirror-reversed +.11 2.94 .005

'Computed as [R-LY[R+14, where R = percent choices with emotional expression on right ofchimera, L = percent choices with emotional expression on left, and R + L = 100%. Negativescores indicate a left visual field bias, positive scores a right field bias.

bOne-sample t -tests of whether the mean laterality ratio was significantly greater than 0,indiotting a significant perceptual asymmetry.

211

Page 213: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

204 Best and Queen

DISCUSSIONThe results of Experiment 2 replicated those of

Experiment 1, even though the gestalt of thewhole faces had been modified by removal of thefacial outline and of extraneous details other thanthe pattern of the central facial features. In fact,the magnitude of the effects failed to differsignificantly from those found in Experiment 1.These findings suggest that the viewers'perception of the more complete chimericphotographs in the previous study had been basedupon the central facial features rather than ontheir relation to peripheral information such asthe "frame" of the facial outline and hair. It alsosuggests that the negative-valence effect is due tothe emotional nature of the perceptual response tothe faces, rather than to differential involvementof the right hemisphere's putative holisticapproach and the left hemisphere's putativefeature-analytic approach to negative vs. positiveexpressions, respectively.

Perhaps, however, the stimulus manipulationsof Experiment 2 did not provide sufficientinterference with the gestalt of the facialexpressions to disrupt the right hemisphere'sgreater holistic response to crying than to smilingexpressions. A clearer disruption of the holisticapproach would involve focusing the viewers'attention on even more narrowly-defined featuresof the faces, such as the mouth or the eye region,both of which would be expected to carry much ofthe information conveyed in an emotionalexpression. The third experiment investigated thepossibility that such a narrow feature-orientedapproach would influence the negative-valenceeffect.

EXPERIMENT 3The purpose of the third experiment was to

determine the effect of restricting the viewers'attention to the emotional expression displayed insingle, isolated facial features, which should moredefinitely bias the perceptual approach toward theanalytic, feature-oriented abilities ascribed to theleft hemisphere. If information processingdifferences in the perception of smiles ascompared to cries were responsible for thenegative-valence effect in the earlier studies, thenthis manipulation should either eliminate thevalence effect in perceptual asymmetries or shift itto a strong RVF bias for smiling expressions witha weak or nonexistent LVF bias for cryingexpressions. However, if the negative-valence

effect arises instead from the emotional nature ofthe judgments, then the pattern of findings andthe magnitude of the effects should be imperviousto this manipulation.

In restricting viewers' attention to specific facialfeatures, we focused on the expressive patterningof the mouth versus that of the the eyes. In aprevious paper (Best & Queen, 1989), we hadfound that the infants' right hemiface bias inexpressiveness was specific to the mouth region ofthe face, and was not present in the eye region,even though the viewers had no difficulty makingemotionality judgments about pairs of chimerasgenerated from either of tnese isolated facialregions. Because cortical input to the mouthregion is contralateral, whereas input to the eyeregion is bilateral, those earlier results hadsuggested that lateralized cortical specializations,rather than more peripheral factors, areresponsible for the right hemiface bias in infantexpressions. Thus, a second purpose of the presentexperiment was to test whether adults' perceptualasymmetries are influenced by the difference inasymmetrical patterning between the eye and themouth regions of the infants' expressions. ForExperiment 3, a new group of judges waspresented with an "upper face" test and a "lowerface" test that employed further modifications ofthe digitized, edited infant expressions developedin Experiment 2.

MethodSubjects

Fifty-four familial right-handed universitystudents (27 female, 27 male) participated in thisexperiment. They received $4.00 for participatingin a 45 minute session. All had normal orcorrected vision, and all had participated in theBest and Queen study (1989).

StimuliThe digitized, edited faces from Experiment 2

were again revised to produce an "upper face" test,for which all facial features other than the eyes,brows and bridge of the nose were removed , and a"lower face" test, for which all features other thanthe mouth and the tip of the nose were eliminated.Mixed-expression chimeras were generatedseparately for the eyes/brows and for the mouth(see Figure 2).3 Two 40-page test booklets wereconstructed as in the previous experiments, onefor the "lower face" test and one for the "upperface" tes4. These were duplicated on a high, qualityphotocopier, as before.

212

Page 214: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

,fr"",

Asymmetries in Perception of Infant Expressions 205

SMILE CRY

Figure 2. Examples of digitized mixed-expression chimeras of the eye region and the mouth region of a smiling and acrying infant, with the emotional expressions on the left versus the right side of the picture, Experiment 3.

ProcedureThe subje.;Ls were tested under the same

conditions and instructions as in the first twoexperiments. They each completed the "lower face"test first and the "upper face' test second. Pilottesting suggested that judgments about theeyes/brows might be more difficult thanjudgments about the mouth; this test order thusallowed more practice before the more difficulttest.

Results

The data were handled as in Experiments 1 and2, except that the ANOVA for this experimentincluded a third, new factor: face part (mouth vs.eyes), and the alpha level adjustment for multiplet-tests set the significance level at p < = .002.

Once again, there was an overall LVF bias (seeTable 3 for mean laterality ratios and t-tests). Themagnitude of this LVF bias did not differsignificantly from that found in either of theprevious experiments. The emotion effect wasagain significant, F(1,53) = 13.96, p < .0005,

indicating that the valence of the infant emotioncontinued to influence the viewers' perception ofthe expressions, even when their attention wasrestricted to isolated facial features. The cryinginfants elicited a significant LVF bias, whichreversed to a nonsignificant RVF bias for thesmiling infants. The magnitude of this valenceeffect again failed to differ significantly from thatfound in Experiments I and 2. As in the first twoexperiments, there was also a significantorientation effect, F(1,53) = 39.70, p = .0000.There was a LVF bias only for judgments ofnormal orientation expressions, when the infants'right hemiface was on the left side of thechimeras. The mirror-reversed chimeras elicited asmall, nonsignificant RVF bias. The Emotion xOrientation interaction was significant as well,Fo. 53) = 8.97,p < .004. Also as before, orientationhad a smaller effect on perceptual asymmetries inresponse to crying than to smiling expressions.There was a LVF bias for crying infants, whichwas significant for the normal orientationchimeras, but not for mirror-reversed chimeras. Incontrast, the normal orientation smiling infants

213

Page 215: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

206 Best and Queen

evoked an even larger LVF bias, but the mirror-reversed smile expressions produced a significantRVF bias. According to simple effects tests of thisinteraction, the orientation effect was significantfor both the crying expressions, F(1,53) = 5.13,P =

.02, and the smiling expressions, F(1 53) = 76.97,p < .0000. However, the emotion difrerence wassignificant only for the mirror-reversed chimeras,F(1,53) = 21.28, p < .0000, and not for the normalorientation chimeras.

Table 3. Laterality ratios for the perception of mixed-expression chimeras of smilingand crying infants in normal and mirror-reversed orientation, eye versus mouth,Experiment 3.

Summary.1111USUSS---LateralityRation t valuP p

EffectsOverall perceptual bias -.08 -4,44 .0000'Emotion effect

Smile -.02 -0.86 isQy -.14 -5.14 .0000

Orientation effectNormal orientation -.19 .8A7 .0000Mirror - revered +.03 1.09 re

Emotion x Orientation interactionSmile

Normal orientation -.17 -6.99 .0000Mirror-reversed +.14 4.79 .0000

QyNormal orientation -.20 -5.62 .0000Mirror - reversed -.08 -1.85 rs

Face part x Orientation interactionMouth region

Normal orientation -35 -1031 .0000Mirror-reversed +.20 4.93 .0000

Eye regionNormal orientation -.03 -1.10 isMirror-reversed -.14 -398 .0002

Face part x Orientation x Emotion interactionMouth region

SmileNormal orientation -37 -1731 .0000Mirror-reversed +32 11.92 .0000

Cry

Normal orientation -.13 -232 rsWuror-reversed -.11 -1.93 rs

Eye regionSmile

Normal orientation +.23 5.672 .0000Mirror-reversed -.24 -495 .0000

Cry

Normal orientation -.28 -6Z .0000Mirror-reversed -.05 -0.97 is

+Computed as [R-L1412+14, where R is percent choices with emotional expression on right ofchimera, L as percent choices with emotional expression on left, and R + L a 100%.Negative scores indicate a left visual field bias, positive scores a right field bias.

bOne-sample Rests of whether the mean laterality ratio was significantly greater than 0,indicating a significant perceptual asymmetry.

21, 4

Page 216: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

Asymmetries in Perception Infant 207

The new face part factor also entered into twosignificant interactions. The Face I at xOrientation interaction, F(1;53) = 101.86, p =.0000, indicated that the infants' expressiveasymmetries had a greater influence onperception of the mouth than of the eyes. Thenormal orientation chimeras of the mouth yieldeda large LVF bias in perception, while the mirror-reversed versions yielded a large RVF bias. Incontrast, the eye chimeras produced a smaller butsignificant LVF bias when presented in mirror-reversed orientation, which became nonsignificantfor the normal orientation eyes. Simple effectstests of the Face part x Orientation interactionshowed that the face part difference wassignificant for both the normal orientationchimeras, F(1 53) = 62.39, p = .0000, and themirror-reversed ones, F(1,53) = 44.88, p < .0000.Moreover, the orientation effect was significantboth for the mouth, F(1,35) = 119.58, p = .0000,and for the eyes, F(1,53)= 6.68,p < .01.

The Face part x Orientation x Emotioninteraction was also significant, F(1,53) = 165.97,p = .0000. The results for the mouth regionchimeras 'followed the pattern found in the earlierexperiments. The smiling mouths produced alarge LVF bias for normal orientation chimeras,and a large RVF bias for mirror-reversed ones. Yetorientation had no appreciable effecton perceptionof the crying mouths, which showed anonsignificant LVF bias for both normalorientation and mirror-reversed chimeras. Thecrying eyes yielded a LVF bias that wassignificant for normal orientation chimeras, butnot for mirror-reversed ones, also consistent withthe earlier findings. The smiling eyes, however,elicited a RVF bias for normal orientationchimeras, but a LVF bias for mirror-reversedchimeras. The direction of this orientation effectfor judgments of smiling eyes was the oppositefrom the pattern of the Emotion x Orientationinteractions found in Experiments 1 and 2, wherethe normal orientation was associated with LVFbias and the mirror-reversed orientation withRVF bias. Thus, the smiling eyes appear to beresponsible for the Face part x Orientationinteraction. Nonetheless, it is still clear that, forjudgments of both the mouth and the eyes,orientation had a greater effect on perceptualresponses toward the smiling expressions thantoward the crying expressions. According tosimple effects tests, the orientation effect wassignificant for crying eyes, F(1,53) = 53.48, p =.0006, for smiling mouths, F(1,53) = 457.26, p =.0000, and for smiling eyes, F(1,53) = 11.06, p <

.002, but not for crying mouths. Also, the emotioneffect was significant for eyes in normalorientation, F(1,53) = 56.96, p = .0000, and inmirror-reversed orientation, F(1,53) = 9.72, p <.003, as well as for mouths in normal orientation,F(1,53) = 58.74, p = .0100, and in mirror-reversedorientation, F(1,53)= 90.83, p = .0000.

DISCUSSIONThe significant emotion effect was not

diminished relative to the two earlierexperiments, in spite of restricting the viewers'attention to isolated facial features. This resultstrongly suggests that the negative-valence effecton asymmetries in perception of infant emotionalexpressions derives from the emotional nature ofthe judgments rather than from differences in theinformatio, processing of negative versus positiveexpressions. Moreover, differences in perception ofthe eye and mouth regions suggest that theviewers were sensitive to differences in theexpressive asymmetries displayed by those facialregions. Consistent with the Best & Queen (1989)finding that the right hemiface bias in infantexpressions was significant only for the mouthregion, the viewers in the present study weremore affected by the orientation of the mouth thanby the orientation of the eyes. For both the eyesand the mouth, however, the negative-valenceeffect held up (the Face part x Emotion interactionwas not significant), in that there was a weakerorientation effect, or greater effect of perceptualasymmetry, for crying expressions than forsmiling expressions. That is, the LVF bias inperception of infant emotional expressions wasstronger (less affected by the infants' expressiveasymmetries) for crying than for smilingexpressions, suggesting greater right hemi.;,,hereinvolvement in perception of negative than inpositive emotions.

GENERAL DISCUSSIONThe results of all three experiments support the

hypothesis that the right hemisphere isspecialized for perception of negative emotion, butthat perception of positive emotion is less stronglylateralized, a view we have referred to as thenegative-valence hypothesis (e.g., Ehrlichman,1988). The valence hypothesis of RH specializationfor negative emotion, but LH specialization forpositive emotion, was not supported, in thatperception of infant smiles failed to show asignificant RVF-LH bias. Support was also lackingfor the motivational hypothesis, because theapproach response that adults would be expected

215

Page 217: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

208 Best and Queen

to show toward both crying and smiling infantsdid not result in a RVF-LH perceptual bias, aspredicted.

Given that tt e majoritj of studies onasymmetries in perception of adults' facialexpressions have instead supported the RHhypothesisthat the right hemisphere isspecialized for all emotion, regardless of valencethe present findings suggest that the influence ofvalence on perceptual asymmetries may dependon some involvament of emotional responses inthe viewer. Our task, like those used in otherreports that have supported variants of thevalence hypothesis, called for judgments about theemotionality of the stimulus expressions. Incontrast, discrimination or recognition judgmentswere required by many of the studies thatobtained an overall RH advantage for emotionperception.

However, judgments about emotionality may notalone suffice to produce a negative-valence effecton perceptual asymmetries. Levy et al. (1983)presented subjects with pairs of mixed-expressionchimeras of half-neutral, half-smiling adult facesin a free-field task and asked for judgments aboutthe relative emotionality of each pair, as in thepresent study, yet those researchers obtained asignificant LVF-RH advantage for perception ofpositive emotion. Our use of infant facialexpressions to increase the likelihood of theviewers' emotional response to the stimuli mayhave been important in obtaining an influence ofvalence on perceptual asymmetries. Thissuggestion is corroborated by our finding that thcsubjects in Experiments 1 and 2 showed a strong,significant LVF bias in their perception of theLevy et al (1983) adult chimeric smilingexpressions (Best & Queen, 1987), whereas theyhad instead shown a nonsignificant (Experiment1) or small LVF bias (Experiment 2) in theirperception of infant chimeric smiling expressions.Even stronger support is provided by Chaiken(1988), who used the same free-fi?.1d chimeric facetechnique to compare adui.s' perceptualasymmetries for smiling and crying adultexpressions versus smiling and crying infantexpressions. Her subjects showed a significantvalence effect in response to the infantexpressions, but no valence effect in response tothe adult expressions. It should be noted,however, that the viewers' actual emotionalresponses to the infant and adult stimuli were notdirectly assessed in any of these studies. Thus,further research is needed to test the hypothesisthat emotional responses in the viewer are crucial

in producing a valence effect on asymmetries inthe perception of emotional expressions.

The results of experiments 2 and 3 also indicatethat the negative-valence effect on perception ofinfant expressions cannot be explained bydifferences in the balance between the basicinformation processing approaches of the twohemispheres du, ing the perception of negativeversus positive emotions. Although stimulusmanipulations designed to focus the viewers'attention on progressively more restrictedfeatures of the infant facial expressions mighthave shifted perception toward the putativeanalytical, feature-oriented approach of the LH,these manipulations did not influence the overalldegree of asymmetry in emotion perception. Nor,more importantly, did the manipulations changethe magnitude of the valence effect on perception.These findings thus suggest that the negative-valence effect in perception of infant emotionalexpressions reflects an aspect of hemisphericspecialization that is independent of informationprocessing asymmetries. The most likely basis forthis separate characteristic of hemisphericspecialization is an asymmetry in emotionalresponsiveness to stimuli, such that the RH showsgreater sensitivity in the perception of negativeemotion.

It is interesting to note that the adult's LVF-RHbias in perception of infants' crying expressions iscompatible with recent findings that emotionalexpressions are more intense on the right side ofthe Wane! race (Best & Queen, 1989; Rothbart,Taylor & Tucker, 1989). That is, in face-to-faceinteractions, the infant's more expressivehemiface would appear in the adult's moresensitive visual hemispatial field (seeIntroduction; also Kinsbourne, 1978), presumablyincreasing the likelihood of the adult's emotionalresponse to the infant. This pattern of spatialcompatibility between perceiver and producer ofan emotional expression does not hold in the caseof adults interacting face-to-face with otheradults. Adults instead show a left hemiface bias inexpressiveness, which would place the moreexpressive hemiface in the viewer's less sensitivehemispatial field. The enhancement of adults'sensitivity and responsiveness to infantexpressions, relative to adult expressions, isconsistent with ethological theory, as discussed inthe general introduction. But why should there begreater compatibility between infant expressiveasymmetry and adult perceptual asymmetry inthe case of crying expressions than of smilingexpressions? Perhaps this can be related to

216

Page 218: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

ii

Asymmetries in Perception of Infant Expressions 209

differences in the imperativeness of adultresponses to infant distress and pleasure states.Presumably, infant distress may indicate somepossible danger to the infant, requiring immediateaction on the part of the caregiver or other adult,whereas an infant's smile does not signal the needfor such immediate action. Therefore, theevolutionary pressure for enhanced respon-siveness to infant crying expressions would havebeen greater than the pressure for enhancedresponsiveness to infant smiles.

REFERENCESAhern, C. L., Sc Schwartz, G. E. (1979). Differential lateralization

fur positive versus negative emotion. Neuropsydalogia,17 , 693-698.

Best, C T., & Queen, H. F. (1989). Baby, it's in your smile RightMadison bias in infant emotional expressions. DevdoprnentalPsychology, 25, 264-276.

Best, C T., & Queen, H. F. (1987). (The free-field chimeric facetechnique is sensitive to asymmetries in adult posers' fadalapressiond Unpublished data.

Bowlby, J. (1969). Attachment and loss (Vol. 1). New York: BasicBooks.

Bradshaw, J. L, & Nettleton, N. C. (1981). The nature ofhemispheric specialization in man. The Behavioral and BrainSciences, 4, 51-91.

Brody, N., Goodman, S. E., Halm, E., ICrinzman, S., & Sebrechts,M. (1987). Lateralized affective priming of la teralized

effectively valued target words. Neutopsydologia, 25, 935-946.Bryden, M. P. (1982). literality: Functional asymmetry in the intact

brain. New York: Academic Press.Bryden, M. P. & Ley, R. C. (1983). Right hemispheric involvement

in imagery and affect. In E. Pereanan Cognitive processingin the right hemisphere. New York: Academic Press.

Bryden, M. P., & Sprott, D. A. (1981). Statistical determination ofdegree of laterality. Neuroprychologia,29, 571-581.

Buck, R. (1986). The psychology of emotion. In J. E. LeDoux & W.Hirst (Eds.), Mind and brain: Dialogues in cognitive neuroscience(pp. 275-300). London, England: Cambridge University Press.

Campbell, R. (1978). Asymmetries in interpreting and expressinga posed facial expression. Cortex, 19,327.342.

Campos, I. I., Barrett, K. C., Lamb, M. E., Goldsmith, H. H., ScStenberg, C (1983). Socioemotional development. In P. H.Mussen (Ed.), M. M. Heidi Sc 1.1. Campos (Vol. Eds.), Handbookof child development: Vol. 11. Infancy and developmentalpsydabiology (pp. 784-915). New York: John Wiley Sc Sons.

Carlson, D. F., Sc Nerds, L J. (1985). Perception of positive andnegative emotion in free viewing of asymmetrical faces: Sexand handedness effects. Paper presented at the InternationalNeuropsychology Society meeting February 6-8, Son Diego.

Chaiken, S. (1988). A free vision test of developmental changes inhemispheric asymmetries for perception of infant and adult emotionalexpressions. Unpublished M. A. thesis, Wesleyan University,Middletown CT.

Chemist% B. D., Sc Cowan% H. K. (1975). Functional asymmetryin the face. Ada Anatomic., 91, 154-160.

Davidson, R. J. (1984). Affect, cognition, and hemisphericspecialization. In C E. Izard, J. Kagan, Sc R. Zajonc (Eds.),Emotion, cognition and behavior (pp. 320365). New York, NY:Cambridge University Press.

Davidson, R. J., Sc Fox, N. A. (1982). Asymmetrical brain activitydiscriminates between positive and negative affective stimuli inhuman infants. Science, 218,1235 -1237.

Davidson, R. J., Schwartz, C. E., Saron, C, Bennett, J., Sc Coleman,D. J. (1979). Frontal versus parietal EEC asymmetry duringpositive and negative affect. Psychophysiology,16, 202-203.

Dimond, S. J., & Farrington, L (1977). Emotional response to filmsshown to the right and left h ..misphere of the brain measuredby heart rate. Acta Psydsologica, 41,255.260.

Dimond, S. J., Farrington, L., Sc Johnson, P. (1976). Differingemotional response from right and left hemispheres. Nature,261, 690692.

EhrlicK'nan, H. (1988). Hemispheric asymmetry and positive-negative affect In D. Ottoson (Ed.), Duality and the unity of thebrain. London: MacMillan Press.

Elbl- Elbesfeldt, I. (1975). Ethology: The biology of behavior. NewYork: Holt, Rhinehart and Winston.

Ekman, P. (1972). Universals and cultural differences in facialexpressions of emotion. Mange symposium on motivation (pp.207-283). Lincoln: University of Nebraska Press.

Ekman, P., Sc Fatten, W. V. (1978). Facial action coding system. PaloAlto, CA: Consulting Psychologists Press.

Emde, R. N., Gaensbauer, T. J., Sc Harmon, R. J. (1976). Emotionalexpression in infancy: A biobehavioral study. New York:International Universities Pres. .

Fox, N. A., Sc Davidson, R. J. (1986). Taste-elicited changes infacial signs of emotion and the asymmetry of brain electricalactivity in newborn infants. Nesoopsychologia, 24, 417-422.

Fox, N. A., Sc Davidson, R. J. (198?). Plectroencephalogramasymmetry in response to the approach of a stranger andmaternal separation in 10-month-old infants. DevelopmentalPsychology, 23, 233-240.

Fox, N. A., Sc Davidson, a. 1. (1988). Patterns of brain electricalactivity during facial signs of emotion in 10-month-olds.Developmental Psychology, 24, 230236.

Gage, D. F., Sc Safer, M. A. (1985). Hemisphere differences in themood state - dependent effect for recognition of emotional faces.Punta! of Experimental Psychology: Learning, Memory andCognition, 11, 752-763.

Gainotti, C. (1972). Emotional behavior and hemispheric side oflesion. Cortex, 8, 41-55.

Gainotti, C. (1988). Disorders of emotional behavior and ofautonomic arousal resulting from unilateral brain damage. InD. Ottoson (Ed.), Duality and the unity of the brain. London:MacMillan Press.

Grego, D. M., Sackeim, H. A., Sanchez, E, Cohen, B. H., Sc Hough,S. (1988). Perceiver bias in the processing of human faces:Neuropsychological mechanisms. Cortex, 24, 91-117.

Heller, W., Sc Levy, J. (1981). Perception and expression ofemotion in right-handers and left-handers. Neumpsydiologis, 19,263.272.

Hildebrandt, K. A. (1983). Effect of facial expression variations onratings of infants' physical attractiveness. DevelopmentalPsychology, l9, 414-417.

Hildebrandt, K. A., Sc Fitzgerald, H. E. (1978). Adults' responsesto infants varying in perceived cuteness. Behavioural Processes,3, 159-172.

I-Wdebrandt, K. A., Sc Fitzgerald, H. E (1979). Adults' perceptionsof infant sex and cuteness. Sex Roles, .5, 471-481.

Hildebrandt, K. A., Sc Fitzgerald. H. E (1981). Mothers' responsesto infant physical appearance. Infant Mental Health Journal, 2,56-61.

Hirschman, R. S., Sc Safer, M A. (1982). Hemisphere differences inperceiving positive and negative emotions. Cortex, 18, 569-580.

21

Page 219: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

210 Best and Queen

iard, C. (1979). The maximally diraintimuity heal namentent codingyam Newark, DE: University of Delaware Press.

Izard, C., Huebner, R. L, Risser, D., Mcannes, G. C., &Dougherty, I.. M. (1980). The young infant's ability to producediscrete emotional expressions. Developmental Psychology, 16,132-140.

Kinsbourne, M. (1978). The biological determinants of functionalbisymmetry and asymmetry. In M. Kinsbourne (Ed.),Asymmetrical functions of the brain. New York: CambridgeUniversity Press.

Kuhn, C. M. (1973). The Phi coefficient as an index of eardifferences in dichotic listening. Cortex, 9,447.457.

Ladavas, E., Unlit., C., & Ricci-BM, P. E. (1980). Evidence for sexdifferences in right hemisphere dominance for emotions.Neuropsydelogia,18, 361-366.

Levy, J. (1974). Psychobiological implications of bilateralasymmetry. In S. J. Dimond & G. Beaumont (Eds.), HemisphereJunction in the human brain. London: Eh& Books.

Levy, J., Heller, W., Banich, M. T., & Burton, L. A. (1983).Asymmetry of perception in free viewing of chimeric faces.Brain and Cognition, 2,404-419.

Ley, L C, & Bryden, M. P. (1979). Hemispheric differences inrecognizing faces and emotions. Brain and Language, 7,127-138.

Ley, R. C. & Bryden, M. P. 0951). Consciousness, emotion, andthe right hemisphere. In L Stevens & G. Underwood (Eds.),Aspects of consciountess: Smarms, WINS. New York: AcademicPress.

Lorenz K. Z. (1935). Studies in animal and human behavior.(VoL 1).Cambridge, MA: Harvard University Press.

Lorenz, K. Z. (1981). The foundations of ethology. New York:Springer-Verlag.

Lorenz, K. Z., & Leyhausen, P. 0973). Motivation of human andanimal behavior. New York: Van Nostrand.

Monad-Krohn, G. FL (1924). On the dissociation of voluntary andemotional innervation in facial paralysis of central origin. Brain,47,22-31

Moscovitch, M. (1983). 'The 'inside& and emotional functions ofthe normal right hemisphere. In E. Pareanan (Ed.), Cognitiveprocessing in the right hemisphere. New York: Academic Press.

Natal., M., Cur, R. E., Sc Cur, R. C. (1983). Hemisphericasymmetries in processing emotional expressions.Neuropsydelogia, 21,515565.

Oster, H., & Ekman, P. (1978). Facial behavior in childdevelopment. In W. A. Collins (Ed.), Minnesota Symposium onchild psychology (Vol. Z pp. 231-276). Hillsdale, NJ: LawrenceErlbaunt Associates.

Remillard, G. M., Andaman, F., Rhi-Sausi, A., & Robbins, N. M.0977). Facial asymmetry in patients with temporal lobeepilepsy: A clinical sign useful in the 'sterilization of tenToralepileptic fod. Neurology, 27,109-114.

Reuter-Lorenz, P., & Davidson, R. J. (1981). Differentialcontributions of the two cerebral hemispheres to the perceptionof happy and sad faces. Nampsyclelogia,19, 609-613.

Reuter-Lorenz, P. A., Qv* It. P., & Moscovitch, M. (1983).Hemispheric specialization and the perception of emotion:Evidence from right-handers and from inverted and non-inverted left-handers. Neuropsychologia, 21,687-692.

Rink W. E. (1984). The neuropsychology of facial expression: Areview of the neurological and psychological mechanisms forproducing facial expressions. Psychological Bulletin, 35, 52-77.

Rossi, G. F., & Rosadini, G. (1967). Experimental analysis ofcerebral dominance in man. In C. H. Milliken & F. L. Darley(Eds.), Brain mechanisms underlying speech and language. NewYork: Crum & Stratton.

Rothbart, M. K., & Posner, M. I. (1985). Temperament and thedevelopment of self-regulation. In I.. C. Heritage & F. C.Telzrow (Eds.), The nempsychology of individual differences: Adevelopmental perspective (pp. 93-123). New York: Plenum Press.

Rothbart, M. K., Taylor, S. B., & Tucker, D. M. (1989). Right-sidedfacial asymmetry in infant emotional expression.Neuropsychologia, 2, 675-687.

Sackeim, H. A., Greenberg, M. S., Welman, A. L, Cur, R. C.,Hungerbuhler, 1. P., & Ceschwind, N. (1982). Hemisphericasymmetry in the expression of positive and negativeemotions. Arddves of Neurology, 39,210-211

Sackeim, H. A., & Cur, R. C. (1978). Lateral asymmetry inintensity of emotional expression. Nevropsychologia,16, 473-481.

Sackeim, H. A., & Cur, R. C. (1980). Asymmetry in facialexpression. Science, 209,834-836.

Safer, M. A. (1981). Sex and hemisphere differences in access tocodes for processing emotional expressions and faces. Jotcruil ofExperimental Psychology: General,110, 86-100.

Segalowitz, S. J. (1985). Hemispheric asymmetries in emotionalfaces. Payer presented at the International NeuroesYchoionSodety meting, February 6-8, Sett Diego.

Silberman & Weingartner 0986). Hemispheric 'sterilization offunctions related to 'motion. Brain and Cognition, 5,322-353.

Scottie, L. A. 0979). Sodoemotiond development. In j. Osofsky(Ed.), Handbook of Want development. New York John Wiley.

Strauss, E., & Mosoovitch, M. 0981). Perceptual asymmetries inprocessing facial expression and facial identity. Brain andLanguage, 13,308-322.

Tertian, H. (1964). Behavioral and EEG effects of intracarotidsodium amytal injection. Arta 14eurochirurgica,12,230-239.

Tucker, D. M. (1981). Lateral brain function, emotion, andconceptualization. Psychological Review, a 19-46.

FOOTNOTES

tAlso Wesleyan University, Middletown, CT.nWesleyan University.3Performance level corrections such as the Phi coefficient (Kuhn,

1973) or I, (Bryden & Sprott, 1981) are neither necessary norapplicable with binary forced-choice data

2However, this failure may be due, in part, to the manner inwhich those researchers paired their adult chimera. Whereaswe always paired the two normal orientation chimeras or thetwo mirror-reversed chimeras of a given infant for forced-choice judgments, Levy et al. (1983) had always paired anormal orientation chimera with its own mirror-image. Theirapproach may have masked their viewers' sensitivity to poserasymmetries. In an unpublished extension of their study wemodified the pairings of the Levy et a. adult chimeras suchthat both members of each pair were in the same orientation(best & Queen, 1987). Under those conditions we found asignificant influence of poser asymmetries upon the viewers'perceptual field biases. Even in our extension of Levy et al.,nonetheless, the influence of adult poser asymmetrim upon theviewers' percept...1 biases was very much smaller than thatfound with our infant faces. It should also be noted that theLevy et al. stimuli included only smiling expressions, whichyielded the largest influence of poser asymmetries for infantexpressions in the present study.

3The eye and mouth regions were not separated from oneanother until after the midline had been drawn on each facefrom the point midway between the internal canthi and thecenter of the philtrum, as in Experiment 1.

218

Page 220: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

211

Appendix

SR # Report Date DT1C # ERIC #

SR-21 /22 January-June 1970 AD 719382 ED 044-679SR-23 July-September 1970 AD 723586 ED 052-654SR-24 October-December 1970 AD 727616 ED 052-653SR-25/26 January-June 1971 AD 730013 ED 056-560SR-27 July-September 1971 AD 749339 ED 071-533SR-28 October-December 1971 AD 742140 ED 061-837SR-29/30 January-June 1972 AD 750001 ED 071-484SR-31/32 July-December 1972 AD 757954 ED 077-285SR-33 January-March 1973 AD 762373 ED 081-263SR-34 April-June 1973 AD 766178 ED 081-295SR-35/36 July-December 1973 AD 774799 ED 094-444SR-37/38 January-June 1974 AD 783548 ED 094-445SR-39/40 July-December 1974 AD A007342 ED 102-633SR-41 January-March 1975 AD A013325 ED 109-722SR-42/43 April-September 1975 AD A018369 ED 117-770SR-44 October-December 1975 AD A023059 ED 119-273SR-45/46 January-June 1976 AD A026196 ED 123-678SR-47 July-September 1976 AD A031789 ED 128-870SR-48 October-December 1976 AD A036735 ED 135-028SR-49 January-March 1977 AD A041460 ED 141-864SR-50 April-June 1977 AD A044820 ED 144-138SR-51/52 July-December 1977 AD A049215 ED 147-892SR-53 January-March 1978 AD A055853 ED 155-760SR-54 April-June 1978 AD A067070 ED 161-096SR-55/56 July-December 1978 AD A065575 ED 166-757SR-57 January-March 1979 AD A083179 ED 170-823SR-58 April-June 1979 AD A077663 ED 178-967SR-59/60 July-December 1979 AD A082034 ED 181-525SR-61 January-March 1980 AD A085320 ED 185-636SR-62 April-June 1980 AD A095062 ED 196-099SR-63/64 July-December 1980 AD A095860 ED 7 97-416SR-65 January-March 1981 AD A099958 EC 101-022SR-66 April-June 1981 AD A105090 ED 206-038SR-67/68 July-December 1981 AD A111385 ED 212-010SR-69 January-March 1982 AD A120819 ED 214-226SR-70 April-June 1982 PD A119426 ED 219-834SR-71/72 July-December 1982 AD A124596 ED 225-212SR-73 January-March 1983 AD A129713 ED 229-816SR-74/75 April-Septembt- . 1983 AD A136416 ED 236-753SR-76 October-December 1983 AD A140176 ED 241-973SR-77/78 January-June 1984 AD A145585 ED 247-626SR-79/80 July-December 1984 AD A151035 ED 252-9SR-81 January-March 1985 AD A156294 ED 25i -w9SR-82/83 April-September 1985 AD A165084 ED 266-508SR-84 October-December 1985 AD A168819 ED 270-831SR-85 January-March 1986 AD A173677 ED 274-022SR-86/87 April-September 1986 AD A176816 ED 278-066

SR-99/100 July - December 1989

2 i 9

Page 221: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

212

SR-88 October-December 1986 PB 88-244256 ED 282-278SR-89/90 January-June 1987 PB 88-244314 ED 285-228SR-91 July-September 198 AD A192381 **

SR-92 October-December 1987 PB 88-246798 *..

SR-93/94 January-June 1988 PB 89-108765 **

SR-95/96 July-December 1988 PB 89-155329/ASSR-97/98 January-June 1989 PB 90-121161/AS

AD numbers may be ordered from:

U.S. Department of CommerceNational Technical Information Service5285 Port Royal RoadSpringfield, VA 22151

ED numbers may be ordered from:

ERIC Document Reproduction ServiceComputer Microfilm Corporation (CMC)3900 Wheeler AvenueAlexandria, VA 22304-5110

In addition, Haskins Laboratories Status Report chi Speech Research is abstracted in Language and LanguageBehavior Abstracts, P.O. Box 22206, San Diego, CA 92122

"Accession number not yet assigned

SR- 99/100 July-December 1989

Page 222: ED 321 318 - ERIC · Vial L Hanson Katherine S. Harris Leonard Kate Rena Arens Krakow Andrea G. Levitt Alvin M. Liberman Isabelle Y. Liberman Diane Lilt-Martin Leigh Lisker' Anders

SR-99/100JULY-DECEMBER 1989

Contents

HaskinsLaboratories

Status Report on

Speech Research

Coarticulatory Organization for Lip-rounding in Turkish and English

Suzanne E. Boyce 1

Long Range Coarticulatory Effects for Tongue Dorsum Contact in VCVCV Sequences

Daniel Reca :ns 19

A Dynamical Approach to Gestural Patterning in Speech Production

Elliot L. Saltzman and Kevin G. Munhall ...38Articulatory Gestures as Phonological Units

Catherine P. Brownian and Louis Goldstein 69The Perception of Phonetic Gestures

Carol A. Fowler and Lawrence D. Rosenblum 102Competence and Performance in Child Language

Stephen Crain and Janet Dean Fodor 118Cues to the Perception of Taiwanese Tones

Hwei-Bing Lin and Bruno H. Repp........... 137Physical Interaction and Association by Contiguity in Memory for theWords and Melodies of Songs

Robert G. Crowder, Mary Louise Serafine, and Bruno H. Repp 148

Orthography and Phonology: The Psychological Reality of Orthographic Depth

Ram Frost 162

Phonology and Reading: Evidence from Profoundly Deaf Readers

Vicki L. Hanson 172

Syntactic Competence and Reading Ability in Children

Shlomo Bentin, Avital Deutsch, and Isabelle Y. Liberman 180Effect of Emotional Valence in Infant Expressions uponPerceptual Asymmetries in Adult Viewers

Catherine T. Best and Heidi Freya Queen 195

Appendix 211