Aspects of Phonological Fusion - Cornell University

16
Journal of Experimental Psychology: Human Perception and Performance 1975, Vol. 104, No. 2, 105-120 Aspects of Phonological Fusion James E. Cutting Yale University and Haskins Laboratories, New Haven, Connecticut Phonological fusion occurs when the phonemes of two different speech stimuli are combined into a new percept that is longer and linguistically more complex than either of the two inputs. For example, when PAY is presented to one ear and LAY to the other, the subject often perceives PLAY. The present article is an investigation of the conditions necessary and sufficient for fusion to occur. The rules governing phonological fusion appear to be the same for synthetic and natural speech, but synthetic stimuli fuse more readily. Fusion occurs considerably more often in di- chotic stimulus presentation than in binaural presentation. The phenomenon is remarkably tolerant of differences in relative onset time between the to-be-fused stimuli and of relative differences in fundamental frequency, intensity, and vocal tract configuration. Although phonological fusion is insensitive to such nonlinguistic stimulus parameters, it is sensitive to lin- guistic variations at the semantic, phonemic, and acoustic levels. Most of the dichotic listening literature has dealt with the phenomenon of perceptual rivalry. Given a different stimulus presented to each ear at the same time, the subject typically reports hearing one or both of them. The different information contained in each stimulus is not combined into a sin- gle percept. Thus, for example, given the dichotic digits FOUR/SIX, the subject never reports hearing SORE or FIX. Per- ceptual fusion does occur when certain vari- ables are taken in account. In several types of fusion phenomena, the stimulus variables that facilitate fusion appear to be psycho- linguistic in nature. For example, given the dichotic pair PAHDUCT/RAHDUCT, the subject often reports hearing PRODUCT (Day, 1968). In this type of fusion, seg- This research was supported by National Insti- tute of Child Health and Human Development Grant HD-01994 to the Haskins Laboratories, and was based in part on a PhD thesis submitted to the Department of Psychology of Yale University. Thanks go to Ruth S. Day, with whom I have never had a colorless conversation; Franklin S. Cooper, with whom I have never had a fruitless conversation; and the members of the New Haven Dance Ensemble, with whom I never talked about this at all. The author is now at Wesleyan University, Middletown, Connecticut and Haskins Laborato- ries. Requests for reprints should be sent to James E. Cutting, Haskins Laboratories, 270 Crown Street, New Haven, Connecticut 06510. ments of both stimuli are combined to form a new percept that is longer and linguistically more complex than either of the two inputs. This phenomenon is called phonological fusion because it conforms to phonological rules of English: Given BANKET/LAN- KET, the subject reports hearing BLAN- KET, not LBANKET (Day, Note 1). Nonword fusions also occur; for example, GAB/LAB yields GLAB, not LGAB (Day, 1968). According to phonological rules of cluster information in English, initial stop consonant + liquid clusters are allowed, but initial liquid + stop clusters are not. Day (1970) found that when stimuli are pre- sented dichotically with these constraints re- moved, fusion occurs in both directions. Given the stimuli TASS/TACK, for exam- ple, the subject reports hearing TASK on some trials and TACKS on others. Both /sk/ and /ks/ clusters are permissible in final position in English. Phonological fusion cannot be explained as a response bias for acceptable English words. Day (1968) found that when different pro- ductions of PAHDUCT are presented to each ear, subjects report hearing PAH- DUCT. That is, they do not report hearing the acceptable English word that corre- sponds most closely to the nonword inputs. Likewise, RAHDUCT/RAHDUCT yields 105

Transcript of Aspects of Phonological Fusion - Cornell University

Page 1: Aspects of Phonological Fusion - Cornell University

Journal of Experimental Psychology:Human Perception and Performance1975, Vol. 104, No. 2, 105-120

Aspects of Phonological Fusion

James E. CuttingYale University and Haskins Laboratories, New Haven, Connecticut

Phonological fusion occurs when the phonemes of two different speechstimuli are combined into a new percept that is longer and linguisticallymore complex than either of the two inputs. For example, when PAY ispresented to one ear and LAY to the other, the subject often perceivesPLAY. The present article is an investigation of the conditions necessaryand sufficient for fusion to occur. The rules governing phonological fusionappear to be the same for synthetic and natural speech, but syntheticstimuli fuse more readily. Fusion occurs considerably more often in di-chotic stimulus presentation than in binaural presentation. The phenomenonis remarkably tolerant of differences in relative onset time between theto-be-fused stimuli and of relative differences in fundamental frequency,intensity, and vocal tract configuration. Although phonological fusion isinsensitive to such nonlinguistic stimulus parameters, it is sensitive to lin-guistic variations at the semantic, phonemic, and acoustic levels.

Most of the dichotic listening literaturehas dealt with the phenomenon of perceptualrivalry. Given a different stimulus presentedto each ear at the same time, the subjecttypically reports hearing one or both ofthem. The different information containedin each stimulus is not combined into a sin-gle percept. Thus, for example, given thedichotic digits FOUR/SIX, the subjectnever reports hearing SORE or FIX. Per-ceptual fusion does occur when certain vari-ables are taken in account. In several typesof fusion phenomena, the stimulus variablesthat facilitate fusion appear to be psycho-linguistic in nature. For example, given thedichotic pair PAHDUCT/RAHDUCT, thesubject often reports hearing PRODUCT(Day, 1968). In this type of fusion, seg-

This research was supported by National Insti-tute of Child Health and Human DevelopmentGrant HD-01994 to the Haskins Laboratories, andwas based in part on a PhD thesis submitted tothe Department of Psychology of Yale University.Thanks go to Ruth S. Day, with whom I havenever had a colorless conversation; Franklin S.Cooper, with whom I have never had a fruitlessconversation; and the members of the New HavenDance Ensemble, with whom I never talked aboutthis at all.

The author is now at Wesleyan University,Middletown, Connecticut and Haskins Laborato-ries. Requests for reprints should be sent toJames E. Cutting, Haskins Laboratories, 270Crown Street, New Haven, Connecticut 06510.

ments of both stimuli are combined to forma new percept that is longer and linguisticallymore complex than either of the two inputs.

This phenomenon is called phonologicalfusion because it conforms to phonologicalrules of English: Given BANKET/LAN-KET, the subject reports hearing BLAN-KET, not LBANKET (Day, Note 1).Nonword fusions also occur; for example,GAB/LAB yields GLAB, not LGAB (Day,1968). According to phonological rules ofcluster information in English, initial stopconsonant + liquid clusters are allowed, butinitial liquid + stop clusters are not. Day(1970) found that when stimuli are pre-sented dichotically with these constraints re-moved, fusion occurs in both directions.Given the stimuli TASS/TACK, for exam-ple, the subject reports hearing TASK onsome trials and TACKS on others. Both/sk/ and /ks/ clusters are permissible infinal position in English.

Phonological fusion cannot be explained asa response bias for acceptable English words.Day (1968) found that when different pro-ductions of PAHDUCT are presented toeach ear, subjects report hearing PAH-DUCT. That is, they do not report hearingthe acceptable English word that corre-sponds most closely to the nonword inputs.Likewise, RAHDUCT/RAHDUCT yields

105

Page 2: Aspects of Phonological Fusion - Cornell University

106 JAMES E. CUTTING

RAHDUCT. Only when the stimuli arePAHDUCT/RAHDUCT does fusion occur,and it occurs regardless of which stimulusis presented to which ear. In the presentarticle, I will consider a fusion response tooccur when a stimulus pair, each item ofwhich has n phonemes, yields a percept ofn + 1 phonemes (following Day, 1968).

The studies presented here are an anal-ysis of phonological fusion and the condi-tions necessary and sufficient for fusion tooccur. Of primary interest are the modesof presentation that optimize the occurrenceof fusion responses and the stimulus vari-ables under those conditions that enhanceand detract from the fused percept. Bothlinguistic and physical (nonlinguistic) vari-ables are of interest.

To undertake such a project, one must beable to control the stimuli in a precise man-ner, and such control is only available witha computer-driven speech synthesizer. How-ever, synthetic speech has a characteristic"cold-in-the-nose" quality which may effectphonological fusion. To insure that thepresent studies are not concerned with arti-facts inherent to synthetic speech, fusion wasfirst observed for synthetic and naturalspeech tokens of the same items.

SYNTHETIC VERSUS NATURAL SPEECH

Experiment 1: Baseline Data

General Method

Tape preparation. Four sets of stimuli of thesame general pattern were selected: the PAY set(PAY, LAY, RAY) ; the BED set (BED, LED,RED) ; the CAM set (CAM, LAMB, RAM) ;and the GO set (GO, LOW, ROW). Two ver-sions of each stimulus were prepared, one in syn-thetic speech and one in natural speech. Syntheticstimuli were prepared on the Haskins Laborato-ries' parallel-resonance synthesizer. All stimuliwere identical in pitch contour, rising from 116 to124 Hz and then falling to 82 Hz. Within eachset all stimuli were identical in duration (3SO, 450,400, and 350 msec for the four sets, respectively),and all had the same acoustic structure except forthe initial ISO msec. Liquid stimuli within eachset (those beginning with /!/ and /r/) differedonly in the direction and extent of the third-formant transition. This acoustic cue distinguishesthe two liquids (O'Connor, Gerstman, Liberman,D&lattre, & Cooper, 1957). Each liquid stimulusbegan with 50-msec steady state prerelease formantresonances, followed by 100-msec transitions in

each formant gliding to the resting frequency ofthe following vowel. Stop consonants began with50 msec of formant transitions gliding into thefollowing vowel. Voice onset time was 0 msecfor voiced consonants and +50 msec for voicelessconsonants. Natural speech versions of the sameitems were spoken by the author and recorded onaudio tape. An effort was made to match utter-ances in terms of pitch, intensity, and duration,but some variation was unavoidable. Both syn-thetic and natural speech stimuli were digitizedand stored on disk file for preparation of dioticand dichotic tapes (Cooper & Mattingly, 1969).

Subjects. Sixteen Yale University undergradu-ates participated in identification and fusion tasks.Each received course credit for his or her efforts.As in all experiments in this paper, subjects wereright-handed native American English speakerswith no history of hearing trouble and with noexperience listening to synthetic speech. Theylistened in groups of four to tapes played on anAmpex AG-500 dual-track tape recorder. Audi-tory signals were sent through a listening sta-tion to matched Telephonies earphones (ModelTDH39). Gains on the tape recorder and listen-ing station were adjusted so that stimuli were pre-sented at approximately 80 db. (re 20 /u,N/m2).These criteria and procedures were used for allexperiments in the present article. No subjectserved in more than one experiment. The fusiontask in the present study preceded the identificationtask, but it is more instructive to consider themin reverse order.

Task 1: Identification of Individual Stimuli

All stimuli, natural and synthetic, were recordedon a diotic identification tape. The tape consistedof a random sequence of 120 trials: (4 sets ofstimuli) X (2 versions of each set: natural andsynthetic) X (3 stimuli per set) X (5 observa-tions per stimulus). Subjects wrote down the en-tire word that they heard presented. There was a3-sec interval between items.

Results

All stimuli were highly identifiable. Eachsynthetic and natural speech item was cor-rectly identified on better than 93% of alltrials. Cluster responses (such as PLAY)were reported on less than 2% of all trials.Identification tasks were a part of each ex-periment presented in this article. Since thestimuli and results generally did not differfrom those in Experiment 1, the identifica-tions are not discussed in all experiments.

Task 2: Identification of Fusible Pairs

All stimuli used in the identification task werealso used in the fusion task. However, instead ofpresenting one stimulus at a time, two stimuli

Page 3: Aspects of Phonological Fusion - Cornell University

PHONOLOGICAL FUSION 107

were presented, one to each ear. The dichotic pairconsisted of a stop stimulus and a liquid stimulusfrom the same set; for example, PAY/LAY orPAY/RAY. All stimuli and possible fusions weremonosyllabic, high-frequency English words: PAY/LAY -» (yields) PLAY, PAY/RAY -» PRAY,BED/LED -» BLED, BED/RED -» BREAD,CAM/LAMB -» CLAM, CAM/RAM -> CRAM,GO/LOW -» GLOW, and GO/ROW -» GROW.Most of these items and fusions have Thorndike-Lorge (1944) word frequencies of at least 100 permillion. No alveolar stop-consonant stimuli werechosen because /tl/ and /dl/ clusters do not ap-pear in initial position in English.

Two dichotic tapes were prepared, one usingsynthetic stimuli and one using natural speechstimuli. Three lead, times were selected: The plo-sive portion, of the stop initial stimulus began SOmsec before the onset of the liquid initial stimulus,the stop and liquid began simultaneously, or theliquid began SO msec before the stop. Day (1968)and Day and Cutting (Note 2) found that withinsuch a restricted range, lead time has little effecton fusion. The pulse code modulation system atthe Haskins Laboratories allows temporal align-ments to within .5-msec accuracy (Cooper &Mattingly, 1969). There were 96 randomly or-dered items per tape: (4 sets of fusible stimuli)X (2 liquids per set) X (3 lead times) X (2 chan-nel arrangements per pair) X (2 observations perdichotic item). All subjects listened to both typesof stimuli: Half listened first to the syntheticspeech pairs and then to the natural speech pairs,while the other half listened in the reverse order.Subjects were instructed to write down what theyheard on each trial (one word or two, real wordsor nonsense) and to respond after each trial. Be-fore the task began, they listened to several prac-tice pairs and wrote their responses in order tofamiliarize themselves with the task and the stimuli.

Results and Discussion

Fusion responses occurred readily, butwere more frequent for synthetic stimulithan for natural speech stimuli. Subjectsfused synthetic pairs on 61% of all trials,while fusing on only 31% of all naturalspeech trials. This 2:1 ratio was highlysignificant by a sign test: All 16 subjectsshowed results in this direction (2 = 3.75,p < .0001). The order in which subjectslistened to synthetic and natural speechpairs was not a significant factor.

In general, fusion was more frequent forstop + /!/ pairs than for stop + /r/ pairs(e.g., PAY/LAY versus PAY/RAY), asshown in Table 1. This pattern was re-ported previously by Day (1968). Anoma-lous fusion responses frequently occurred

TABLE 1PERCENT FUSION RESPONSES FOR EACH PAIR IN

EXPERIMENT 1, IN BOTH SYNTHETIC SPEECHAND NATURAL SPEECH VERSIONS

Fusible pairPercent fusion

Synthetic Natural M

PAY/LAYPAY/RAY

BED/LEDBED/RED

CAM/RAMCAM/LAMB

GO/LOWGO/ROW

7870

4846

6357

6860

4133

3319

2917

4135

5951

(55)4033

(37)4637

(42)5447

(51)

Stop + /!/Stop + /r/

M

6458

(61)

3626

(31)

5042

Note. Values in parentheses are the means of the means.

for both synthetic and natural speech pairs.The most common was a stop + /!/ fusionresponse for a stop + /r/ stimulus pair: forexample, PAY/RAY -» PLAY. Whereas/!/ was substituted for /r/ quite frequently,the reverse substitution was much less fre-quent. Day (1968) and Cutting and Day(1975) have reported this /l/-for-/r/ sub-stitution phenomenon. It cannot be ac-counted for by the frequency of these clus-ters in English. In this study, /l/-for-/r/substitutions occurred for both synthetic andnatural speech stimuli at approximately thesame rate. The relative high frequency ofthese responses in the present study seemsunusual, since Day (1968) found them tooccur on only 4% of all fusions of stop +/r/ pairs. The phenomenon will be con-sidered in more detail in Experiments 8and 10.

Frequency of fusion responses differedacross sets of stimuli. Table 1 shows thefusion rates for the four sets in both syn-thetic and natural speech versions. Fusionrates for the PAY and GO sets were con-sistently higher than those for the BED andCAM sets. These differences may be re-lated to the frequency of occurrence of thefused responses as words in English; ac-cording to Thorndike and Lorge (1944)

Page 4: Aspects of Phonological Fusion - Cornell University

108 JAMES E. CUTTING

and Carroll, Davies, and Richman (1971),the words PLAY, PRAY, GLOW, andGROW occur more frequently in generalpublications than BLED, BREAD, CLAM,and CRAM. Day (1968) has shown thatfusions occur more frequently when thefused outcome is an acceptable English wordthan when it is not (although nonword fu-sions do occur, e.g., GORIGIN/LORIGIN-^GLORIGIN). Perhaps within the do-main of words, frequency of occurrence inthe natural language also plays a role in fu-sion. There were no ear effects nor chan-nel effects in the present experiment or inany experiment described in this article.There were no significant effects of lead timeconfiguration here, but see Experiment 3.

Overview of Experiment 1

Although there were differences in fusionbetween synthetic and natural speech stim-uli, the rules that govern their fusibilityappear to be comparable. This conclusionis based on two distinct results. First, thepattern of fusions for the four sets of stimuliwas quite similar for both natural and syn-thetic speech stimuli. In both cases thePAY and GO sets fused more readily thanthe CAM and BED sets. Second, the pat-tern of "misperceptions" for the two types ofstimuli was almost identical. The phoneme/!/ was quite frequently substituted for /r/on a fusion response; for example, PAY/RAY -» PLAY. The reverse substitutionwas much less common. Thus, it appearsthat generalizations can be made about as-pects of phonological fusion from the resultsof tasks using synthetic stimuli.

ASPECTS OF STIMULUS PRESENTATION

Experiment 2: Dichotic VersusBinaural Presentation

MethodThe synthetic stimulus tapes used in Experi-

ment 1 also were employed here. Twenty-fourdifferent Yale University undergraduates listenedto dichotic pairs, in which each item was presentedto a separate ear, and to binaural pairs, in whichitems were electrically mixed and presented toboth ears. Eight subjects listened first to a se-quence of dichotic trials, then to a sequence ofbinaural trials (Group 1), while eight others lis-tened in reverse order (Group 2). For these

subjects binaural pairs were presented by mixingthe two channels of the dichotic tape. The re-maining eight subjects listened to a new tape ofdichotic and binaural trials randomly intermixed(Group 3). This tape was exactly twice as long(192 items) as the tapes of Experiment 1, and thedichotic and binaural pairs were constructed digi-tally by computer at the time of recording.

Results and Discussion

Fusion was more frequent for dichoticpresentations than for binaural presenta-tions: 45% and 15%, respectively. This3:1 ratio was highly significant (z = 4.8, p< .0001) : All 24 subjects yielded fusion re-sults in this direction. Other than this maineffect, there were no statistically reliabledifferences in fusion. For example, the di-chotic fusion rates for Groups 1, 2, and 3were 53%, 38%, and 45%, respectively;corresponding binaural fusion rates were16%, 8%, and 23%. Large variances in fu-sion rate within groups kept these differ-ences from revealing any other statisticallyreliable pattern. As in Experiment 1, /!/-for-/r/ phoneme substitutions occurred read-ily, and there was only a slight effect of leadtime. Such differences are considered inmore detail in Experiment 3. Differences infusion rate among stimulus sets were similarto those in Experiment 1.

More frequent fusion in dichotic than inbinaural presentation demonstrates that pho-nological fusion is an active and centralprocess. "Pre-fused," or electrically mixed,synthetic stimuli are not perceived as a pho-nological mixture of the two items. Thereason fusion is so infrequent in the binauralcase presumably stems from the fact that theelectrically mixed stimuli beat against oneanother, creating a very "dirty" signal. Nosuch contamination occurs for dichotic pairs.Unique left-ear and right-ear stimuli appearto be perceptually combined at some centrallocus.

Experiment 3: Variation in Lead Time

Day (1970; Note 1) and Day and Cutting(Note 2) demonstrated that variation in thealignment of natural speech fusible stimulihas little effect on fusion rate. Fusion oc-curs almost as readily when the liquid stim-ulus (e.g., LANKET) leads the stop stim-

Page 5: Aspects of Phonological Fusion - Cornell University

PHONOLOGICAL FUSION 109

ulus (BANKET) by as much as 150 msec,as when the stop leads the liquid by thesame extent. Moreover, fusion rate in bothinstances is nearly identical to that for thesimultaneous onset case. This result is par-ticularly surprising when one considers thatneither the segment /b/ in BANKET or/!/ in LANKET was longer than 90 msecin duration. In Experiments 1 and 2, leadtimes of 0 msec, + SO msec (when the stopleads), and —50 msec (when the liquidleads) yielded nearly identical fusion rates.The present experiment was designed, inpart, to investigate relative onsets (leadtimes) of longer than 150 msec to discoverthe interval at which fusion disintegrates.

In addition, Experiment 3 explores therelationship between fusion and backwardmasking. Consider the dilemma in the com-parison of two hypothetical dichotic pairs.Imagine first the nonfusible stimulus pairPAY/DAY. If one staggers their onsetsby 50 msec, the second item will most likelymask the first (see Pisoni, 1973, and Stud-dert-Kennedy, Shankweiler, & Schulman,1970, among many others). Thus, if DAYleads PAY, the subject may report hearingonly PAY. In a fusible pair, PAY/LAY,the later-arriving item will not mask thefirst. Even if LAY begins 50 msec beforePAY, the subject seldom reports hearingPAY alone, but usually fuses them intoPLAY. This response in this particularsituation is the crux of a dilemma. The seg-ments /p/ and /!/ are not only combinedinto a cluster but also perceptually reorderedin time: /!/ began before /p/ in the stimuli,but in the response they are reversed. Therules of language dictate the perception ofPLAY, whereas the rules of backwardmasking and central processing in generalpredict the perception of PAY. Perhapsvarying relative onset times beyond 50 andeven 150 msec in the phonological fusion sit-uation would strain fusion and favor back-ward masking.

MethodThe PAY set (PAY, LAY, RAY) was again

employed. In addition, the KICK set (KICK,LICK, RICK) was generated on the Haskins"synthesizer. The PAY set stimuli were 350 msecin duration, and the KICK set stimuli 325 msec.

C/DUJinZO So-cl-es

OC 40-

5 3°-Oa:uj 20-Q.

10-

:•• -One-item Responses

— Fusion Responses

stop led liquid led

TIME(msec)

FIGURE 1. Comparison of fusion rate and num-ber of one-item responses at 11 lead times inExperiment 3.

Within a set the items were identical in all re-spects except for the initial formant transitionsrequisite for the perception of a stop consonantversus /r/ versus /!/. Dichotic pairs were as-sembled for all combinations of fusible itemswithin a set: PAY/LAY -> PLAY, PAY/RAY-» PRAY, KICK/LICK -» CLICK, and KICK/RICK —> CRICK. Eleven lead times were chosen:0, ±50, ±100, ±200, ±400, and ±800 msec. Sinceall stimuli were 350 msec or less in duration, the400- and 800-msec relative onset items involvedtemporally nonoverlapping stimuli. All pairs wererepresented in two tapes, each with a differentrandom order. Each tape consisted of 88 pairs:(2 sets of stimuli) X (2 stop/liquid pairs per set)X (11 lead times) X (2 channel arrangements perpair). Eight Yale University undergraduatesserved as subjects for course credit. Each sub-ject listened to both tapes, yielding 176 trials persubject.

Results and Discussion

As shown in Figure 1, fusion occurredmost readily at short lead times, especiallywhen the stop stimulus led the liquid by 50and 100 msec. Here, fusion responses oc-curred on 63% and 59% of all trials, respec-tively. Fusions were infrequent at the-200, ±400, and ±800 msec leads, wherethey occurred on an average of less than10% of the time. Intermediate fusion ratesoccurred at the 0, -50, -100, and +200msec leads. Both sets of stimuli fused read-ily, with fusions slightly more frequent forthe PAY set.

The fusion rate at the short leads (0 and±50 msec) in the present experiment is

Page 6: Aspects of Phonological Fusion - Cornell University

110 JAMES E. CUTTING

similar to that found in Experiments 1 and2. One would expect fusion to occur at thelonger lead times where members of a pairdo not overlap in time, but one might expectthat they would depress fusion responserates at short leads. The absence of a de-crease suggests that fusions are not a re-sponse bias resulting from lack of knowledgeabout the stimuli. The fusions that didoccur at long leads occurred primarily at thebeginning of the task when subjects wereless familiar with the items. Fusions atshorter leads continued at the same ratethroughout the task. Again, /l/-for-/r/substitutions were frequent.

Fusion and -Backward Masking

To consider the conflict between phono-logical rules and the rules of backwardmasking, it is necessary to observe morethan just fusion responses. For example,not only are there fusion and nonfusion re-sponses, but there are one-item and two-item responses as well. These four cate-gories are not mutually exclusive. Considerthe stimulus pair PAY/LAY: There areone-item fusions, PLAY; one-item non-fusions, PAY or LAY; two-item fusions(responses in which at least one fusion oc-curs), PLAY + PAY, PLAY + LAY, oreven PLAY + PLAY; and of course two-item nonfusions, PAY + LAY. All ofthese responses occurred in the presentstudy, but at different frequencies accordingto lead times. As might be expected, one-item responses predominated at the shortleads while two-item responses occurred atthe long leads. In Figure 1, the one-itemresponse curve is superimposed on the fu-sion curve. Notice that it is nearly sym-metrical, unlike the fusion response curve.

Except for the 200-, 400-, and 800-msecrelative onsets, the fusion curve is primarilya subset of the single response curve. Thatis, between ±100 msec the subject typicallywrote down a single response, and oftenthat response was a fusion (e.g., PLAY).When the subject wrote 'down a single re-sponse that was not a fusion, 91% of thoseresponses were stop stimuli alone (e.g.,PAY). Thus, the area between the twocurves in Figure 1 is primarily a represen-

tation of the number of PAY or KICKresponses.

Between +200- and 0-msec lead times,phonological rules dominated. Most one-item responses within this lead-time do-main were fusions. However, between 0-and — 200-msec leads, two types of single-item responses occurred at near equalfrequencies. One-item fusions, PLAY, dem-onstrate the influence of language rules,whereas one-item nonfusions, PAY, dem-onstrate backward masking and the in-fluence of central processing constraints.Within this lead-time domain, 58% of allone-item responses are fusions and 42%are nonfusions. In Experiments 1 and 2phonological rules always dominated overthe rules of backward masking. In the pres-ent experiment, in which lead times wereextended to more than double the stimulusduration, backward masking did occur, butoccurred only for stimulus pairs in whichthe liquid stimulus led the stop by less than200 msec; even in pairs like LAY-before-PAY, backward masking occurred less thanhalf the time. In PAY-before-LAY pairs,there was essentially no backward masking,there were few LAY responses, and PLAYdominated at all the short leads.

Although differences in lead time ofgreater than about 200 msec reduced fusionto a minimum, fusion occurred within the±100-msec domain with considerable fre-quency. Compare this domain with the±2-msec tolerance of relative onset differ-ences for sound localization, another formof auditory fusion, and one sees that thetolerance for lead-time differences in pho-nological fusion is several orders of magni-tude greater.

Overview of Experiments 2 and 3

Mode of stimulus presentation has con-siderable effect on fusion; dichotic fusionsof synthetic speech stimuli are much morefrequent than binaural fusions, suggestingthat central processes are much more im-portant than peripheral processes in phono-logical fusion. The fact that fusion is rela-tively insensitive to lead-time variation lendscredibility to this view. In visual masking,for example, central processes appear to op-

Page 7: Aspects of Phonological Fusion - Cornell University

PHONOLOGICAL FUSION 111

erate over a longer time span than periph-eral processes (Turvey, 1973). If phono-logical fusion is a central phenomenon, per-haps certain physical parameters of the stim-uli would have little effect on the frequencyat which fusion responses occur, just as theyhave little effect on central processes in vi-sion (Turvey, 1973).

PHYSICAL ASPECTS OF STIMULI

Experiments 4-6: Pitch, Intensity,and Vocal Tract Size

Method

The same stimulus sets were used as in Ex-periment 3: the PAY set (PAY, LAY, RAY)and the KICK set (KICK, LICK, RICK). ForExperiment 4, two versions of each stimulus weresynthesized, one at a relatively high fundamentalfrequency (pitch) and one at a relatively lowpitch. All stimuli had a falling pitch contour.High-pitch stimuli began at 140 Hz and fell to120 Hz, whereas low-pitch stimuli began at 120Hz and fell to 100 Hz. For Experiment 5, allthe low-pitch stimuli of Experiment 4 were em-ployed, but they were synthesized at two relativeintensities, 80 db. and 65 db. re 20 /iN/m2. ForExperiment 6, all low-pitch high-intensity stimuliwere employed, but they were synthesized as ifspoken by two different speakers: one a normaladult male and another a speaker whose apparentvocal tract size was only 5/6 as large as theadult male. Because these stimuli had a lowpitch, the tokens from the smaller vocal tract con-figuration sounded as if they were produced by amale midget. Variation in apparent vocal tractsize was accomplished by elevating the formantfrequencies of the PAY and KICK set stimuli bya factor of 6/5. All stimuli were highly identifi-able, as determined by identification tests.

Dichotic pairs were assembled from all possiblecombinations of fusible items within each set.Consider the pair PAY/LAY in Experiment 4.The members of the pair shared the same pitch,PAY-high (for high pitch)/LAY-high and PAY-low/LAY-low, or they differed in pitch, PAY-high/LAY-low and PAY-low/LAY-high. Thesame pattern was followed in Experiment 5, ex-cept that relative intensity was substituted forrelative pitch. In Experiment 6 the pattern wassimilar. Paired items either shared vocal tractsize (PAY-large/LAY-large and PAY-small/LAY-small) or they differed in vocal tract size(PAY-large/LAY-small and PAY-small/LAY-large). Two dichotic tapes with different randomorders were prepared for all three experiments.Each tape contained 96 items: (2 sets of stimuli)X (2 stop/liquid pairs per set) X (4 pitch, in-tensity, or vocal tract combinations) X (3 leadtimes) X (2 channel arrangements per pair). Theleads used in all three experiments were 0 and

±50 msec. Twelve different Yale Universityundergraduates served as subjects in each of thethree experiments.

Results

Fusion occurred readily for all stimulipairs. In Experiment 4 the frequency offusion response rates was identical for pairsthat shared the same pitch and those thatdiffered in pitch: 50% each. Fusions forall four types of pairs were within a fewpercentage points of one another. The pat-tern was the same in Experiments 5 and 6:Pairs with the same intensity and pairs withdifferent intensities all fused at a rate of36%, and pairs with the same vocal tractconfiguration and those with different vocaltract configurations fused at a rate of 59%each. In other words, within each experi-ment there was no contribution nor decre-ment in fusion rate attributable to variationof pitch, intensity, or vocal tract size.

Lead time had a nominal effect on fusion,but there were no interactions with anyother factor. The fusion rates for the +50,0, and —SO msec lead times were, respec-tively, 55%, 48%, and 48% in Experiment4; 40%, 35%, and 34% in Experiment 5;and 65%, 56%, and 57% in Experiment 6.The /l/-for-/r/ substitutions were frequentin all three studies, and the PAY set fusedat a slightly higher rate than the KICK set.

Experiment 7: MultidimensionalStimulus Variation

Method

The PAY set and the KICK set were synthe-sized in eight different versions: (2 pitches, thosefrequencies used in Experiment 4) X (2 intensi-ties, those used in Experiment 5) X (2 vocaltract configurations, those used in Experiment 6).One dichotic tape consisted of items within a setthat shared none of the values in the dimensionsof pitch, intensity, and vocal tract size. Thus,for example, PAY-low(pitch)-low(intensity)-small(vocal tract) was paired with LAY-high-high-large. This tape consisted of 96 items: (2 sets ofstimuli) X (2 liquids per set) X (3 lead times)X (8 different mismatched pairings, across thedimensions of pitch, intensity, and vocal tractsize). Channel arrangements of the stimuli werearranged such that stops and liquids were equallyrepresented on both channels. A second 96-itemdichotic tape was prepared to consist of fusibleitems that were all low-pitch, high-intensity, largevocal tract stimuli: (2 sets of stimuli) X (2 liquids

Page 8: Aspects of Phonological Fusion - Cornell University

112 JAMES E. CUTTING

per set) X (3 lead times) X (2 channel arrange-ments) X (4 observations per pair). Twenty-fourYale University undergraduates listened to both fu-sion tapes. Half the subjects listened first to themultidimensionally varied tape and then to the un-varied tape (Group 1), whereas the other halflistened in reverse order (Group 2).

Results

Differences in pitch, intensity, and vocaltract size among the fusible stimuli had re-markably little effect on fusion. Overallfusion rate was 47% for the multidimen-sionally varied pairs and 44% for the un-varied pairs, a result that shows the reversetrend of what might have been expected.The two groups of subjects differed some-what in fusion rate for each class of stimuluspairs. Group 1 fused on 46% of all variedtrials and on only 38% of the subsequentunvaried trials. This trend was not signifi-cant; only 7 out of 12 subjects showed thispattern. Group 2 fused on 49% of the un-varied trials and on 45% of the subsequentvaried trials. This trend was not significanteither; again, 7 out of 12 subjects yieldedresults in this direction. The accustomednominal effect of lead times and the "usual"amount of /l/-for-/r/ substitutions werefound in the data. Again, PAY-set pairsfused somewhat more often than KICK-setpairs.

Over-view of Experiments 4-7

Within the relatively large ranges of val-ues explored in the dimensions of pitch (20Hz), intensity (15 db.), and vocal tractsize (a ratio of 5:6), the physical aspectsof the signal have remarkably little effect onfusion. Even when the three dimensionsvary, such stimulus pairs as PAY-low-low-small/LAY-high-high-large fuse as readilyas PAY-low-low-small/LAY-low-low-small.Taken together with the results of Experi-ments 2 and 3, the results of these experi-ments strongly suggest that the perceptualcombination of the stimuli occurs after thelinguistic information is extracted from eachof the signals. Otherwise, physical variationwould surely affect the rate at which fusionoccurs.

LINGUISTIC ASPECTS OF STIMULI

Three linguistic levels are considered inthe remaining experiments: the level of se-mantics, the level of the phoneme, and thelevel of acoustic structure as it pertains tolanguage. The phoneme level might appearto be the primary linguistic level at whichphonological fusion occurs, since phonemesare fused into clusters. However, this neednot be the case. Therefore two other levelswere chosen, one higher (semantics) andone lower (acoustics) than the phonemelevel. The experiments begin at the se-mantic level and move "downward" to thephoneme and acoustic levels.

Experiment 8: Semanti-c Level:Sentence Contexts

Day. (1968) found that semantic cues atthe word level influenced fusion rate. Fu-sion rates were higher when the fused out-comes were real words than when they werenonwords (PAHDUCT/RAHDUCT -»PRODUCT vs. PAHLOW/RAHLOW -»PRAHLOW). Nonword fusions did occur(GORIGIN/LORIGIN -> GLORIGIN),although at a reduced rate.

The present experiment was designed toobserve the effects of semantic cues at thesentence level on fusion rate. Since Ex-periments 1-7 found that /!/ was frequentlysubstituted for /r/ in fusion responses, thepresent study was also designed to observethe effect of semantic context on /l/-for-/r/ substitutions.

MethodTwo sets of stimuli were used, the PAY set

(PAY, LAY, RAY) and the GO set (GO, LOW,ROW). Fusible pairs were presented in isola-tion, as in previous experiments, and imbedded insentence contexts. The PAY set appeared in thecontext THE TRUMPETER S FOR US(Sentence Frame 1) and THE MINISTER— — — S FOR US (Sentence Frame 2), whereasthe GO set appeared in, THE COALS ARE

ING AGAIN (Sentence Frame 3) and THETREES ARE ING AGAIN (SentenceFrame 4). These sentences were made into di-chotic pairs such that THE TRUMPETERPAYS FOR US, for example, was presented toone ear and THE TRUMPETER LAYS FORUS to the other. All fusible pairs appeared inboth semantically probable and improbable con-

Page 9: Aspects of Phonological Fusion - Cornell University

PHONOLOGICAL FUSION 113

texts. Thus, when PAY/LAY appears in Sen-tence Frame 1, the response might be THETRUMPETER PLAYS FOR US, a reasonablesentence; but when the same pair appears in Sen-tence Frame 2, the response might be THE MIN-ISTER PLAYS FOR US, a semantically lessprobable sentence. A reverse pattern of expectedresults follows from PAY/RAY pairs, and Sen-tence Frames 3 and 4 with GO-set stimuli paral-lel Sentence Frames 1 and 2.

Two tapes were prepared, one with fusible tar-gets imbedded in sentences and the other withtarget pairs presented in isolation. The sentencetape consisted of 64 items: (4 sentence frames) X(2 stop/liquid pairs per set) X (2 channel ar-rangements) X (4 observations per sentence).Dichotic sentence pairs were presented at a simul-taneous onset with 12 sec between trials. Subjectswrote down the entire sentence they heard. Theisolated-pair tape also had 64 pairs: (2 sets ofstimuli) X (2 stop/liquid pairs per set) X (2channel arrangements) X (8 observations perpair). Again, only the simultaneous onset timewas used, but the intertrial interval was 4 sec.Subjects wrote down "what they heard"—oneword or two, acceptable words or nonsense. Halfof the 16 subjects listened first to the sentencetape and then to the isolated-pair tape, while theothers listened in reverse order.

Results

Fusion was significantly greater in thesentence condition than in the isolated-paircondition. All subjects showed this trend(s = 3.8, p < .001). Fusion rate was 89%for sentence trials and 63% for isolated-pairtrials. The order in which subjects listenedto the tapes was not a significant factor.Fusion rates were comparable for stop +/r/ and stop + /!/ items, as well as forboth sets of stimuli in each condition.

Stop + /!/ responses dominated all sen-tence contexts even when they were seman-tically inappropriate. For Sentence Frame1, THE MINISTER PLAYS FOR US oc-curred on 74% of all trials. Certainly, theminister PLAYING is semantically lesslikely than PRAYING (which occurred ononly 16% of all Sentence Frame 1 trials, re-gardless of the liquid stimulus presented)even in today's climate of changing roles.Likewise, in Sentence Frame 3, THETREES ARE GLOWING AGAIN oc-curred on 83% of all trials, despite the factthat it is not very probable. GROWING re-sponses occurred on only 4% of such trials,

regardless of liquid presented. SentenceFrames 2 and 4 yielded a more predictableset of results: In both cases, stop + /!/ fu-sions are semantically probable and as fusionswere very frequent. They occurred on 96%and 84% of all trials, respectively. Therewere virtually no THE TRUMPETERPRAYS FOR US and THE COALS AREGROWING AGAIN responses, regardlessof what liquid stimulus was presented. Otherresponses to sentence frames were primarilyresponses in which only the stop stimuluswas reported; for example, THE MINIS-TER PAYS FOR US. Pairs of stimuli inthe isolated-pair condition yielded similarliquid substitution results: For example,when PAY/RAY fused, PLAY responseswere given 85% of the time, a pattern nearlyidentical to that for sentence trials. Thereverse substitution rarely occurred.

The present experiment showed that mean-ing at the sentence level can not nullify the/l/-substitution effect. Relative frequencyof occurrence of the fused words cannot ac-count for the substitutions either: GLOW-ING, for example, is much less frequentthan GROWING (Carroll, et al., 1971;Thorndike & Lorge, 1944). Word-levelconsiderations, however, may provide a clue.Day (1968) found that, although subjectsusually reported hearing GROCERY whengiven GOCERY/ROCERY, sometimes theyreported hearing GLOCERY, a nonword.In the present series of experiments, boththe stop + /r/ and stop + /!/ fusions fora given set were acceptable English words.Day (1968), on the other hand, chose stim-uli that could fuse meaningfully with onlyone of the liquids, /!/ or /r/. She foundthat PAHDUCT/RAHDUCT -* PROD-UCT and not PLODUCT, and thatGEEDY/REEDY -» GREEDY notGLEEDY. Such results suggest that mean-ing at the word level can override the /l/-substitution effect. This effect is consid-ered again in Experiment 10.

Experiment 9: Phoneme Level:Stops and Fricatives

Phonological fusion occurs when pho-nemes from different dichotic stimuli are

Page 10: Aspects of Phonological Fusion - Cornell University

114 JAMES E. CUTTING

perceived as a cluster. Experiments 9 and10 examined the phonemic components, thestop and the liquid, to assess their impor-tance in the fusion phenomenon.

In Experiments 1-8, stop consonantsserved as the first phoneme of the to-be-fused consonant-consonant-vowel cluster.The present experiment compared the fusi-bility of stop/liquid and fricative/liquidpairs. Both initial clusters occur very fre-quently in English, but Day (1968) reportedresults of pilot work showing infrequent fu-sions for stimuli involving fricative pho-nemes.

MethodIn addition to the BED set (BED, LED, RED)

and the GO set (GO, LOW, ROW), the fricativestimuli FED and FOE were synthesized. Thefricative /f/ was chosen because it is the onlyfricative in English that clusters with both /r/and /!/ in initial position. Fricative stimuli wereidentical to the stop stimuli in duration, pitch, andintensity, and differed only in the acoustic struc-ture of the first 100 msec. BED and GO haveformant transitions in this region requisite for theperception of stop consonants, plus a portion (SOmsec) of steady state vowel. FED and FOE, onthe other hand, began with bandpass noise (above1,500 Hz) and aspirated transitions requisite forthe perception of /f/. A given liquid stimulussuch as LED was paired with both a stop con-sonant stimulus (BED/LED) and a fricativestimulus (FED/LED). All stimuli and possiblefusions were English words or names: BED/LED-» BLED, BED/RED -> BREAD, FED/LED-> FLED, FED/RED -> FRED, GO/LOW ->GLOW, GO/ROW -» GROW, FOE/LOW -»FLOW, and FOE/ROW -» FRO.

One tape was prepared with stop/liquid pairsand another tape with fricative/liquid pairs. Eachtape consisted of 120 dichotic trials: (2 sets ofstimuli) X (2 consonant/liquid pairs per set) X(3 lead times) X (2 channel arrangements) X (5observations per pair). Twelve subjects listenedto both tapes: half listened first to the fricative/liquid stimuli and then to the stop/liquid stimuli,while the others listened in the reverse order.

Results

Fusions occurred more readily for stop/liquid pairs than fricative/liquid pairs. Fu-sion rates were 57% and 18%, respectively.This 3:1 ratio was highly significant, withall subjects showing greater fusion rates forstop/liquid items (z = 3.18, p < .001). Un-like the results of Experiments 1 and 3, fu-sion rate differences of this experiment can-

not be accounted for by the relative fre-quency of the possible fusion responses inEnglish: for example, BLED and GLOWare much less common than FLED andFLOW (Carroll et al., 1971). Further-more, initial /f/ + liquid clusters occur atabout the same frequency as initial /b/ +liquid clusters in English, and considerablymore frequently than initial /g/ + liquidclusters (Denes, 1965; Hultzen, Allen, &Miron, 1964).

The order in which subjects listened tothe stop and fricative test tapes was not asignificant factor. Fusion rates for stop +/!/ and stop + /r/ stimuli were within afew percentage points. Fricative + /!/ andfricative + /r/ fusion rates were also com-parable. The /l/-for-/r/ substitutions werefrequent for stop/liquid pairs, but not forfricative/liquid pairs. In fact, /r/ substitu-tions occurred on 64% of all trials in whichfricative + /!/ stimuli fused, while corre-sponding /!/ substitutions were rare. Asecond type of substitution also occurred.Fricative/liquid stimuli did not always yieldfricative + liquid responses; for example,FED/RED -» BRED. In fact, about 70% ofall fricative/liquid pair fusions were actuallystop + liquid responses. Stop-for-fricativesubstitutions were not the result of poorfricative stimuli, since subjects identifiedthem in isolation on a diotic test with a highdegree of accuracy. Instead, these substitu-tions appear to be an extension of the dif-ferences in fusibility between the stops andfricatives.

Experiment 10: Phoneme Level:Liquids and Semivowels

The present experiment examined the sec-ond consonant in the to-be-fused cluster.Stimuli beginning with semivowels (i.e.,/w/ and /y/) were generated to see whetherthey would fuse as readily as the liquids,and to see whether the /l/-substitution effectwould be extended to /w/ and /y/.

MethodTwo sets of stimuli were used: the KICK set

(KICK, LICK, RICK used previously, plusWICK) and the COO set (COO, LIEU, RUE,YOU) generated on the Raskins' synthesizer.The only stop consonant that clusters with all

Page 11: Aspects of Phonological Fusion - Cornell University

PHONOLOGICAL FUSION 115

liquids and semivowels in English is /k/; yet /ky/occurs only before the vowel /u/, while /kw/ doesnot occur before /u/. Thus, it was necessary tosynthesize two sets of stimuli, one for /I, r, w/and the other for /I, r, y/. (Note that LIEUcould also be represented as LOU.) Liquid andsemivowel stimuli within the same set were iden-tical in all respects except for the direction andslope of the second- and third-formant transi-tions, as shown in Figure 2. These are the cuesthat dis t inguish all liquids and semivowels (Lisker,1957; O'Connor et al., 1957).

All stimuli and possible fusion responses for bothsets were common English words or names:KICK/LICK -> CLICK, KICK/RICK ->CRICK, KICK/WICK -> QUICK, COO/LIEU-> CLUE, COO/RUE -» CREW, and COO/YOU -» CUE. A tape was prepared with 108dichotic items: (2 sets of stimuli) X (3 liquid andsemivowel stimuli per set) X (3 lead times) X.(2 channel arrangements) X (3 observations perpair). Twelve subjects listened to two passesthrough the tape, reversing headphones after thefirst pass.

Results and Discussion

KICK/LICK, KICK/RICK, and KICK/WICK pairs all fused at an average rate of70%, while COO/LIEU, COO/RUE, andCOO/YOU all fused at an average rate of42%. There were no significant differenceswithin each set. However, regardless ofwhich stimuli were presented, most re-sponses were stop + /!/; /!/ was substi-tuted for /r/, as in previous studies, and itwas also substituted for /w/ and /y/.CLICK and CLUE responses occurred hi89% of all trials in which fusions occurred.Again, word frequency of possible fusionscannot account for the substitutions. Forexample, QUICK is much more commonthan CLICK, and CREW is more commonthan CLUE (Carroll et al., 1971). Never-theless, the /!/ substitutions for both sets ofstimuli yielded relatively common Englishwords. The data of Day (1968) suggestthat when /!/ substitutions do not yield ac-ceptable words, they occur considerably lessoften.

Difficulties with /r/ versus /!/ occur in awide variety of other situations besides pho-nological fusion. Children, for example,have more difficulty in pronouncing /r/than /!/ and sometimes pronounce bothphonemes as /!/ (Morley, 1957; Murray,1962; Powers, 1957); the deaf have more

F2 VARIATION

WICK LICK Llty YOU

FIGURE 2. Spectrograms of liquid and semivowelstimuli used in Experiment 10.

trouble processing /r/ than /!/, and oftenhear /!/ in both cases (Rosen, 1962).The articulation pattern of /r/ is less stablethan /!/ (Bronstein, 1960; Delattre, Note3), and /r/ is more difficult to pronouncecorrectly under conditions of delayed audi-tory feedback than /!/ (Applegate, 1968).The dichotic fusion results are complemen-tary to these studies: Perhaps /r/ is lessstable than /!/ in perception and produc-tion, and the fusion results of this stud}' area manifestation of /r/ instability.

Overview of Experiments 9 and 10

When the first consonant in the to-be-fused cluster is a stop, fusion rate is high,but when it is a fricative, fusion occurs con-siderably less often. The role of the secondconsonant in the to-be-fused cluster is lessclear: Fusion occurred equally well for allstop/liquid and stop/semivowel pairs, yetall pairs tended to yield a stop + /I/ re-sponse.

Experiment 11: Acoustic Level:Liquid Transitions

Linguistic cues at the sentence and pho-neme levels affect fusion rate. Perhaps l in-guistic cues at the level of acoustic struc-tures are also important. Since the liquidis perceptually interpolated between the stopand the vowel in the fusion response, onekey to fusion may lie in the acoustic struc-ture of the liquid. Experiments 11—13 ex-amine aspects of the acoustic structure ofliquids.

Experiment 10 showed that for the pres-ent sets of stimuli, liquids (/r, I/) and semi-

Page 12: Aspects of Phonological Fusion - Cornell University

116 JAMES E. CUTTING

100

O 75

UJ (jO — 50CC <±

liquid vowel

UJocc.

30

1 2 3 4

BOTH A R R A Y S

FIGURE 3. Results of identification and fusiontasks involving liquid-to-vowel stimulus arrays inExperiment 11.

vowels (/w, y/) tend to yield stop + /!/fusions. It would appear that any combi-nation of rising and falling second- andthird-formant transitions is sufficient forstop + /!/ fusions to occur. If any com-bination is sufficient for fusion, perhaps notransitions are needed at all. For example,PAY/AY, a pair without any liquid transi-tions, might also yield stop + /!/ fusions.The present study varied the slope of theliquid transitions to determine the extent offormant transitions necessary for fusion tooccur and to gain additional support for theview that fusion is not a response bias.

MethodThe PAY set and the KICK set were altered

to include five stimuli, one stop stimulus and fourstimuli that formed a liquid-to-vowel continuum.At one end of the continuum, Stimulus 1 had fullliquid transitions in all formants, as found in LAYand LICK, At the other end of the continuum,Stimulus 4 had the same duration but began witha steady state vowel, AY and ICK. Between theextremes were Stimuli 2 and 3 with intermediateamounts of formant transitions. Equal incrementsof acoustic change occurred between successive

stimuli. Sixteen Yale undergraduates participatedin two tasks, dichotic fusion and then diotic iden-tification of the liquid-to-vowel stimuli in isola-tion. Since the results of the identification taskwere highly relevant to the fusion task, those dataare discussed first.

A diotic identification tape of 80 randomizedliquid-to-vowel items was prepared: (2 sets ofstimuli) X (4 stimuli per array) X (10 observa-tions per stimulus). There was a 3-sec intervalbetween each item. Subjects wrote down the sin-gle item that they heard presented on each trial.Dichotic items were constructed by pairing thestop stimuli with all items in the liquid-to-vowelarrays. The tape consisted of 96 pairs: (2 sets ofstimuli) X (4 stimuli per array) X (3 lead times)X (2 channel arrangements) X (2 observationsper pair). Subjects listened to two passes throughthe tape, reversing headphones after the first pass.As usual they wrote their responses, indicatingwhat they heard.

Results and Discussion

Stimuli 1 and 2 were identified as be-ginning with /!/ on 88% of all trials, asshown in the top panel of Figure 3. Stim-uli 3 and 4 were identified as beginning with/!/ on only 8% of all trials. All subjectsshowed this quantal trend. There was nodifference between the LAY-to-AY andLICK-to-ICK stimulus arrays.

Fusion occurred at a rate of 52% for pairscontaining Stimulus 1 or Stimulus 2, thestimuli which had been identified as begin-ning with a liquid. Other pairs yieldedonly 6% fusions (stop + liquid responses),as shown in the lower panel of Figure 3.Thus, liquid-like transitions are necessaryfor phonological fusion: Fusion occurs in di-rect proportion to the extent that liquid-likeitems are indeed perceived as liquids inisolation.

Experiment 12: Acoustic Level:Degraded Liquids

Experiment 11 showed that transitions inthe liquid stimulus were necessary for fusionto occur. The present experiment was de-signed to determine which formant transition(or combination of transitions) in the li-quid is necessary for fusion.

MethodThe PAY set and the KICK set were again

used. Liquid stimuli appeared in many forms.Some were degraded in that certain formants wereomitted from their acoustic structure. Figure 4

Page 13: Aspects of Phonological Fusion - Cornell University

PHONOLOGICAL FUSION 117

shows the component parts of liquid stimuli. Allpossible combinations of all formants were used.Eleven liquid-like stimuli resulted in each set: 2three-formant stimuli identical to those used inprevious studies, 5 two-formant stimuli, and 4 one-formant stimuli, as listed in Table 2.

Each of the 11 liquid-like stimuli was pairedwith its appropriate stop stimulus. In addition,two control pairs were constructed per set: Onepair was a stop/stop pair (PAY/PAY and KICK/KICK like those used by Day, 1968), and theother was a stop presented to one ear and nothingto the other (PAY/ and KICK/ ).No fusion responses should occur for control pairsif fusion occurs only for pairs containing liquid-like stimuli. Twelve subjects listened to a di-chotic tape of 156 items: (2 sets of stimuli) X(13 pairs per set) X (3 lead times) X (2 channelarrangements per pair).

Results and Discussion

There were two general fusion rates:Most pairs fused at a rate of about 55%,but a few pairs fused at a considerably lowerrate, as shown in Table 2. Pairs that rarelyfused contained liquid-like stimuli with onlythe first formant or the third formant of /!/.Figure 4 shows that these stimuli lacked

LIQUID STIMULI

L A Y&

R A Y

xo

3-

LICK&

R I C K

I- H

FIGURE 4. Schematic spectrograms of liquidstimuli and their component parts used in Experi-ment 12. (Fl = first formant, F2 = second for-mant, F3 = third formant.)

TABLE 2PERCENT FUSION RESPONSES FOR ALL

PAIRS IN EXPERIMENT 12

Dichotic pairs Percent fusion

Stop + three-formant liquids1.2.3/1/1,2,3/r/

Stop + two-formant liquids1, 21, 3/1/1, 3/r/2, 3/1/2, 3/r/

Stop + one-formant liquids123/1/3/r/

Control itemsStop + stopStop -j- blank

5751

5723515554

19601664

62

formant transitions in the mid-frequencyrange (1,000-2,000 Hz), while all other liq-uid-like stimuli had transitions in this region(either the second formant or the thirdformant of /r/).

Stop + three-formant liquid stimuli fusedat rates comparable to previous studies:54%. Stop + two-formant liquid pairsfused at a rate of 52%, with the exceptionof the pairs with only first and third for-mants of /!/, where fusion level was only23%. All subjects showed this drop in fu-sion rate (z = 3.18, p < .001). Stop +one-formant liquid pairs also showed highand low fusion rates, following a patternsimilar to that of pairs with two-formantliquids. Again, all subjects showed thesequantal differences in fusion rate. As inprevious studies, stop + /!/ responses oc-curred on more fused trials than did stop +/r/ responses. Stop/stop pairs yielded fewstop + /!/ responses. Such responses wouldbe "false fusions," since the subject wouldbe reporting a liquid when, in fact, none waspresented (Day, 1968).

A specific acoustic cue for phonologicalfusion of stop/liquid stimuli appears to bethe second-formant transition or the third-formant /r/ transition of the liquid stimulus.A single rising formant transition in therange 1,000-2,000 Hz is necessary for fusionto occur.

Page 14: Aspects of Phonological Fusion - Cornell University

118 JAMES E. CUTTING

RISING FALLING SJTEADV

STATE

CHIRPS

FIGURE 5. Schematic spectrograms of liquidstimuli and liquid "chirps" used in Experiment 13.(The original second-formant transition from the

liquid stimuli is the rising chirp stimulus desig-nated 2r.)

Experiment 13: Acoustic Level:Liquid "Chirps"

The previous study found that a mid-range formant transition was necessary forfusion to occur. The present study was de-signed to determine whether that transitionor any other is sufficient for fusion whenpaired with a stop stimulus. When thesecond-formant transition is removed froma liquid stimulus and synthesized by itself,it sounds similar to a bird's twitter, hencethe name "chirp." Mattingly, Liberman,Syrdal, and Halwes (1971) and Wood(1975) have demonstrated that these chirpsare not processed as speech. Chirp stimulihave two general features, relative frequencyrange and direction (rising vs. falling).

MethodTwo stop stimuli were synthesized, PAY and

KICK. Liquid stimuli were degraded so that onlythe second-formant transition remained, a 100-msec chirp rising rapidly from a value of 1,200Hz to 1,800 Hz. This and other chirps were syn-thesized at the same amplitude as the second-formant transition in the full liquid stimulus.Twelve chirp stimuli were used; there were fourfrequency values for rising, falling, and steadystate chirps, as shown in Figure 5. Specific stim-uli are numbered from low to high according toordinal position on the frequency scale, and eachis given a superscript to designate the class towhich it belongs. Endpoints for rising and fallingchirps were 600, 1,200, 1,800, 2,400, and 3,000 Hz,while steady state chirps had frequencies of 900,1,500, 2,100, and 2,700 Hz. The original second-formant transition from the liquid stimuli is therising chirp stimulus designated 21'.

In addition to the stop/chirp pairs, there werecontrol pairs. Ordinary fusible items such asPAY/LAY and PAY/RAY were employed to ob-tain baseline fusion rates. Stop/stop pairs werealso included to set a lower boundary on fusionrate since Experiment 12 found few stop + liq-quid responses to such trials. Thus the controlpairs provided boundary conditions within whichto compare the fusion rates for stop/chirp items.Three lead times were used such that stop/chirpstimulus pairs had the same temporal relationshipsas the stop and the second-formant transition ofthe full liquid in previous experiments. Twelvesubjects listened to a dichotic tape of 180 items:(2 sets of stimuli) X (IS pairs per set, 12 stop/chirp pairs plus such control pairs as PAY/LAY,PAY/RAY, and PAY/PAY) X (3 lead times) X(2 channel arrangements per pair).

Results

Fusions, or stop + liquid responses, oc-curred at a substantially reduced rate for allstop/chirp pairs. While fusion rate forstop/liquid control pairs was 47%, a ratecomparable to previous studies, fusion ratesfor stop/chirp pairs averaged only 8%.This difference was highly significant (z =3.18, p < .001).

Fusion did occur, however, for selectedstop/chirp pairs as shown in Table 3. Pairswith rising chirps yielded an average fusionrate of 14%, higher than all other stop/chirppairs combined (z — 2.6, p < .005). Withinthis category of rising chirps, fusion oc-curred most readily for pairs involving chirpnumber 2r. Eight of 12 subjects fused at ahigher rate for these pairs than for any otherstop/chirp pair (s = 4.0, p < .0001), buteven here fusions occurred significantly lessoften than for the stop/liquid control pairs(2 = 3.18, p < .001). Thus, even the chirpstimulus most appropriate to the full liquid

TABLE 3

PERCENT FUSION RESPONSES FOR STOP/CHIRPPAIRS IN EXPERIMENT 13

SHmnl irnh rtfrequency domain

1 (600-1,200 Hz)2 (1,200-1,800 Hz)3 (1,800-2,400 Hz)4 (2,400-3,000 Hz)

M

Class of chirp stimulus

Rising

142412

714

Falling

563

86

Steadystate

57535

M

812

76

Page 15: Aspects of Phonological Fusion - Cornell University

PHONOLOGICAL FUSION 119

stimulus is not entirely sufficient for fusionto occur at an unreduced rate.

The /l/-for-/r/ substitutions occurred forstop/liquid control pairs at rates comparableto previous studies. Fusion of stop/chirppairs, however, were not dominated by stop+ /!/ responses. In fact, only stop/21' pairsyielded more stop + /!/ fusions than stop +/r/, /w/, or /y/ fusions. Fusions for thelowest frequency chirps paired with a stopstimulus were dominated by stop + /w/ re-sponses, whereas those for highest frequencychirps were dominated by stop + /y/ andstop + /!/ responses. False fusion re-sponses for stop/stop pairs occurred only2% of the time.

Overview of Experiments 8-13Three levels of linguistic cues were ex-

plored (the semantic level, the phonemelevel, and the acoustic level), and the effectof cues at each level on fusion rate was ob-served. The results suggest that the cogni-tive processes involved in phonological fu-sion are influenced by cues at all three lin-guistic levels. Fusion rate was enhancedwhen fusible pairs were imbedded in sen-tence contexts; fusion occurred best for cer-tain classes of phonemes; and specific acous-tic cues were found to be important forfusion. Thus, linguistic cues within andoutside of the consonant/liquid pairs havea distinct effect on fusion rate. Given theappropriate experimental situation, cuesfrom all three levels appear to work in con-cert. Consider one of the sentence pairsfrom Experiment 8: THE TREES AREGOING AGAIN presented to one ear andTHE TREES ARE ROWING AGAIN tothe other. Semantic cues at the sentencelevel increased fusion rate for GO/ROWpairs beyond the rate they yielded when pre-sented in isolation. Cues at the phonemelevel were also influential: Experiment 9demonstrated that GO/ROW pairs fused ata higher rate than FOE/ROW pairs. Ex-periment 12 found that the second-formanttransition in the liquid was a specific cue forfusion. Moreover, this cue appears to bepertinent to the /l/-for-/r/ substitutions:Fusion rates and /!/ substitutions were ap-proximately equal for stop/second-formant

and stop/liquid pairs. Thus, the highrate of THE TREES ARE GLOWINGAGAIN responses appears to be the resultof the synergy of cues from three very dif-ferent levels of language. For further dis-cussion of linguistic influences on phonolog-ical fusion, see Cutting and Day (1975).

ADDITIONAL COMMENTS

Day (1973; Note 1) found marked indi-vidual differences in phonological fusionusing natural speech disyllabic pairs. Inthe present series of fusion experiments,using synthetic speech monosyllabic fusibleitems, such individual differences did occur,but to an extent less marked than that foundby Day. Consider fusion rates for the 140individuals in Experiments 1, 2, and 4-10(the other 48 individuals are eliminated be-cause fusion broke down in their tasks), andconsider the number of subjects whose fu-sion rates fell in the five quintiles between0% and 100% fusion. Only 7 subjects'rates fell between 0% and 20%, whereas 39fell between 21% and 40%, and 54 between41% and 60%. Only 14 subjects' fusionrates were between 61% and 80%, whereas26 were between 81% and 100%. Thisoverall bimodal trend is indicative of sub-jects within each of the nine experiments inquestion. It is similar to that found by Day,but the lower fusion mode has been shifted"upward" in frequency through the use ofhigher fusing synthetic pairs.

Phonological fusion has a direct analog invision. In fact, Ruth Day discovered thisphenomenon in search of an auditory paral-lel to the work of Rommetveit and his co-workers (Rommetveit, Berkeley, & Br0g-ger, 1968; Rommetveit & Kleven, 1968;Rommetveit, Toch, & Svendsen, 1968a,1968b). They found that when letters suchas SHAR were presented to one eye andSHAP to the other eye, many subjects re-ported seeing SHARP.

CONCLUSION

Phonological fusion occurs in dichotic lis-tening in a very systematic fashion. Withina relatively wide range of variation, tem-poral alignment and physical attributes of

Page 16: Aspects of Phonological Fusion - Cornell University

120 JAMES E. CUTTING

the signals appear to matter very little. Lin-guistic aspects of the stimuli, however, mat-ter quite a lot. Stop consonants and liquidsfuse most readily, and fuse in the order inwhich they are phonologically reasonable inEnglish. Given a stop stimulus and a liquidstimulus of the same general pattern, lin-guistic attributes within and outside of thedichotic pairs contribute to the percept.

REFERENCE NOTES1. Day, R. S. Temporal-order judgments in

speech: Are individuals language-bound or stim-ulus-bound. Paper presented at the meeting ofthe Psychonomic Society, St. Louis, Missouri,November 1969. (Also in Haskins LaboratoriesStatus Report on Speech Research, 1969, SR-21/22, 71-87.)

2. Day, R. S., & Cutting, J. E. Levels of proces-sing in speech perception. Paper presented atthe meeting of the Psychonomic Society, SanAntonio, Texas, November 1970.

3. Delattre, P. C. A dialect study of American/r/ by x-ray motion picture. The general pho-netic characteristics of languages (Final Re-port). University of California at Santa Bar-bara, 1967.

REFERENCESApplegate, R. B. Segmental analysis of articula-

tory errors under delayed auditory feedback.Project on Linguistic Analysis (University ofCalifornia at Berkeley), 1968, 8, 1-27.

Bronstein, A. M. The pronunciation of AmericanEnglish. New York: Harper & Row, 1960.

Carroll, J. B., Davies, P., & Richman, B. (Eds.).Word frequency book. New York: Houghton &Mifflin, 1971.

Cooper, F. S., & Mattingly, I. G. Computer-con-trolled PCM system for investigation of dichoticspeech perception. Journal of the Acoustical So-ciety of America, 1969, 46, 115. (Abstract)

Cutting, J. E., & Day, R. S. The perception ofstop-liquid clusters in phonological fusion. Jour-nal of Phonetics, 1975, 3, 9-23.

Day, R. S. Fusion in dichotic listening (Doctoraldissertation, Stanford University, 1968). Dis-sertation Abstracts International, 1969, 29, 2649B.(University Microfilms No. 69-211)

Day, R. S. Temporal-order perception of a re-versible phoneme cluster. Journal of the Acous-tical Society of America, 1970, 48, 95. (Ab-stract)

Day, R. S. Digit span memory in language-boundand stimulus-bound subjects. Journal of theAcoustical Society of America, 1973, 54, 287.(Abstract)

Denes, P. B. On the statistics of spoken English.Journal of the Acoustical Society of America,1965, 35, 892-904.

Hultzen, L., Allen, J., & Miron, M. Tables oftransitional frequencies of English phonemes.Urbana: University of Illinois Press, 1964.

Lisker, L. Minimal cues for separating /w, r, 1,y/ in intervocalic position. Word, 1957, 13, 257-267.

Mattingly, I. G., Liberman, A. M., Syrdal, A. K.,& Halwes, T. Discrimination in speech and non-speech modes. Cognitive Psychology, 1971, 2,131-157.

Morley, M. E. Development of speech disordersin childhood. London: Livingstone, 1957.

Murray, R. S. The development of /r/ in thespeech of preschool children (Doctoral disserta-tion, Stanford University, 1962). DissertationAbstracts International, 1963, 23, 4462. (Uni-versity Microfilms No. 63-2734)

O'Connor, J. D., Gerstman, L. S., Liberman, A. M.,Delattre, P. C., & Cooper, F. S. Acoustic cuesfor the perception of initial /w, y, r, I/ in Eng-lish. Word, 1957, 13, 25-43.

Pisoni, D. B. Perceptual processing time for con-sonants and vowels. Journal of the AcousticalSociety of America, 1973, 53, 369. (Abstract)

Powers, M. H. Functional disorders of articula-tion : Symptomatology and etiology. In L. E.Travis (Ed.), Handbook of speech pathology.New York: Appleton-Century-Crofts, 1957.

Rommetveit, R., Berkeley, M., & Br0gger, J. Gen-eration of words from tachistoscopically pre-sented nonword strings of letters. ScandinavianJournal of Psychology, 1968, 9, 150-156.

Rommetveit, R., & Kleven, J. Word generation:A replication. Scandinavian Journal of Psy-chology, 1968, 9, 277-281.

Rommetveit, R., Toch, H., & Svendsen, D. Ef-fects of contingency and contrast on the cogni-tion of words. Scandinavian Journal of Psy-chology, 1968, 9, 138-144. (a)

Rommetveit, R., Toch, H., & Svendsen, D. Se-mantic, syntactic, and associative context effectsin stereoscopic rivalry situation. ScandinavianJournal of Psychology, 1968, 9, 145-149. (b)

Rosen, J. Phoneme identification in sensorineuraldeafness (Doctoral dissertation, Stanford Uni-versity, 1962). Dissertation Abstracts Interna-tional, 1962, 23, 2253. (University MicrofilmsNo. 62-5510)

Studdert-Kennedy, M., Shankweiler, D. P., &Schulman, S. Opposed effects of a delayed chan-nel on perception of dichotically and monoticallypresented CV syllables. Journal of the Acousti-cal Society of America, 1970, 48, 599 602.

Thorndike, E. L., & Lorge, I. The teacher's word-book of 30,000 words. New York: ColumbiaUniversity, Teachers College, Bureau of Publica-tions, 1944.

Turvey, M. T. On peripheral and central proces-ses in vision: Inferences from an information-processing analysis of masking with patternedstimuli. Psychological Review, 1973, 80, 1-52.

Wood, C. C. Auditory and phonetic levels of pro-cessing in speech perception: Neurophysiologicaland information-processing analyses. Journal ofExperimental Psychology: Human Perceptionand Performance,'1975, 104, 3-20.

(Received May 1, 1974)