Cognition 2012 Nakamura

7
Brief article Immediate use of prosody and context in predicting a syntactic structure Chie Nakamura a,c,e,, Manabu Arai b,c,e , Reiko Mazuka c,d a Graduate School of Science and Technology, Keio University, Japan b Department of Language and Information Sciences, University of Tokyo, Japan c Laboratory for Language Development, Riken Brain Science Institute, Japan d Psychology and Neuroscience, Duke University, USA e Japan Society for the Promotion of Science, Japan article info Article history: Received 22 December 2011 Revised 13 July 2012 Accepted 16 July 2012 Available online 14 August 2012 Keywords: Prosody Contrastive intonation Context Prediction Anticipatory eye-movements Structural ambiguity abstract Numerous studies have reported an effect of prosodic information on parsing but whether prosody can impact even the initial parsing decision is still not evident. In a visual world eye-tracking experiment, we investigated the influence of contrastive into- nation and visual context on processing temporarily ambiguous relative clause sen- tences in Japanese. Our results showed that listeners used the prosodic cue to make a structural prediction before hearing disambiguating information. Importantly, the effect was limited to cases where the visual scene provided an appropriate context for the prosodic cue, thus eliminating the explanation that listeners have simply associated marked prosodic information with a less frequent structure. Furthermore, the influence of the prosodic information was also evident following disambiguating information, in a way that reflected the initial analysis. The current study demonstrates that prosody, when provided with an appropriate context, influences the initial syntactic analysis and also the subsequent cost at disambiguating information. The results also provide first evidence for pre-head structural prediction driven by prosodic and contextual infor- mation with a head-final construction. Ó 2012 Elsevier B.V. All rights reserved. 1. Introduction To comprehend spoken language, listeners need to analyze the speech signal according to the language-spe- cific structure of prosody, which includes cues such as tone, intonation, rhythm, and stress. Previous research showed that speakers provide prosodic cues to disambig- uate structures and that listeners use these cues to guide their structural analysis (Schafer, Speer, Warren, & White, 2000; Snedeker & Trueswell, 2003). For instance, Snedeker and Trueswell (2003) found that speakers pro- sodically distinguish between the alternative structures of globally ambiguous phrases such as Tap the frog with the flower and that the location of the prosodic boundary directly affects listeners’ syntactic analyses (see also Kjelgaard & Speer, 1999; Schafer et al., 2000; Snedeker & Casserly, 2010; Speer, Kjelgaard, & Dobroth, 1996). Schafer, Carter, Clifton, and Frazier (1996) demonstrated that focal accent influences how listeners resolve attach- ment ambiguity of a complex NP followed by a relative clause modifier such as the propeller of the plane that.... These results demonstrate that prosodic information is used online in resolving structural ambiguities (see also Marslen-Wilson, Tyler, Warren, & Lee, 1992). However, it is not yet certain whether it is used immediately for predicting a syntactic structure prior to disambiguating information. The current study addresses this issue by examining the influence of contrastive intonation in combination with contextual information on predictive 0010-0277/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.cognition.2012.07.016 Corresponding author. Address: Laboratory for Language Develop- ment, RIKEN Brain Science Institute, 2-1 Hirosawa, Wako-shi, Saitama 351-0918, Japan. Tel.: +81 48 462 1111x6748; fax: +81 48 467 9760. E-mail address: [email protected] (C. Nakamura). Cognition 125 (2012) 317–323 Contents lists available at SciVerse ScienceDirect Cognition journal homepage: www.elsevier.com/locate/COGNIT

description

Cognition

Transcript of Cognition 2012 Nakamura

Page 1: Cognition 2012 Nakamura

Cognition 125 (2012) 317–323

Contents lists available at SciVerse ScienceDirect

Cognition

journal homepage: www.elsevier .com/locate /COGNIT

Brief article

Immediate use of prosody and context in predicting a syntactic structure

Chie Nakamura a,c,e,⇑, Manabu Arai b,c,e, Reiko Mazuka c,d

a Graduate School of Science and Technology, Keio University, Japanb Department of Language and Information Sciences, University of Tokyo, Japanc Laboratory for Language Development, Riken Brain Science Institute, Japand Psychology and Neuroscience, Duke University, USAe Japan Society for the Promotion of Science, Japan

a r t i c l e i n f o a b s t r a c t

Article history:Received 22 December 2011Revised 13 July 2012Accepted 16 July 2012Available online 14 August 2012

Keywords:ProsodyContrastive intonationContextPredictionAnticipatory eye-movementsStructural ambiguity

0010-0277/$ - see front matter � 2012 Elsevier B.Vhttp://dx.doi.org/10.1016/j.cognition.2012.07.016

⇑ Corresponding author. Address: Laboratory forment, RIKEN Brain Science Institute, 2-1 Hirosawa351-0918, Japan. Tel.: +81 48 462 1111x6748; fax:

E-mail address: [email protected] (C. Nakamura

Numerous studies have reported an effect of prosodic information on parsing butwhether prosody can impact even the initial parsing decision is still not evident. In avisual world eye-tracking experiment, we investigated the influence of contrastive into-nation and visual context on processing temporarily ambiguous relative clause sen-tences in Japanese. Our results showed that listeners used the prosodic cue to make astructural prediction before hearing disambiguating information. Importantly, the effectwas limited to cases where the visual scene provided an appropriate context for theprosodic cue, thus eliminating the explanation that listeners have simply associatedmarked prosodic information with a less frequent structure. Furthermore, the influenceof the prosodic information was also evident following disambiguating information, in away that reflected the initial analysis. The current study demonstrates that prosody,when provided with an appropriate context, influences the initial syntactic analysisand also the subsequent cost at disambiguating information. The results also providefirst evidence for pre-head structural prediction driven by prosodic and contextual infor-mation with a head-final construction.

� 2012 Elsevier B.V. All rights reserved.

1. Introduction

To comprehend spoken language, listeners need toanalyze the speech signal according to the language-spe-cific structure of prosody, which includes cues such astone, intonation, rhythm, and stress. Previous researchshowed that speakers provide prosodic cues to disambig-uate structures and that listeners use these cues to guidetheir structural analysis (Schafer, Speer, Warren, &White, 2000; Snedeker & Trueswell, 2003). For instance,Snedeker and Trueswell (2003) found that speakers pro-sodically distinguish between the alternative structures

. All rights reserved.

Language Develop-, Wako-shi, Saitama+81 48 467 9760.).

of globally ambiguous phrases such as Tap the frog withthe flower and that the location of the prosodic boundarydirectly affects listeners’ syntactic analyses (see alsoKjelgaard & Speer, 1999; Schafer et al., 2000; Snedeker& Casserly, 2010; Speer, Kjelgaard, & Dobroth, 1996).Schafer, Carter, Clifton, and Frazier (1996) demonstratedthat focal accent influences how listeners resolve attach-ment ambiguity of a complex NP followed by a relativeclause modifier such as the propeller of the plane that. . ..These results demonstrate that prosodic information isused online in resolving structural ambiguities (see alsoMarslen-Wilson, Tyler, Warren, & Lee, 1992). However,it is not yet certain whether it is used immediately forpredicting a syntactic structure prior to disambiguatinginformation. The current study addresses this issue byexamining the influence of contrastive intonation incombination with contextual information on predictive

Page 2: Cognition 2012 Nakamura

1 Japanese aspectual marker –te i- in the RC verb notteita can alsoindicate progressive meaning (‘was riding’) as well as resultative meaningas in the example. It is known that the choice of its meaning is highlydependent on context and verb sense (Shirai, 1998). In our study, the visualscene provides sufficient information to disambiguate the meaning.

(a) Non-Contrastive context (b) Contrastive context

Fig. 1. Example visual scenes for Non-Contrastive context (a) and Contrastive context (b).

318 C. Nakamura et al. / Cognition 125 (2012) 317–323

structural processing with temporarily ambiguous rela-tive clause sentences in Japanese. The finding of suchan anticipatory effect would provide clear evidence foran influence of prosodic information on the initial syn-tactic analysis. It would also provide the first evidencefor pre-head structural prediction driven by prosodicand contextual information with a head-final construc-tion (cf. Kamide, Altmann, & Haywood, 2003).

Most previous studies do not provide evidence for theimmediate and truly interactive influence of prosodicinformation on initial syntactic analysis because theyeither relied on offline measures or examined the pro-cesses following disambiguating information. An excep-tion is a study by Weber, Grice, and Crocker (2006b),who tested SVO and OVS structures in German. Thetwo types of sentences were identical (thus temporarilyambiguous) up to the verb due to the case-ambiguoussentence-initial NP but carried different intonation pat-terns, (the nuclear stress accent was on the verb in theSVO structure whereas it appeared on the initial NP inthe OVS structure). Their results revealed different pat-terns of eye-movements between the two structuresprior to the disambiguating sentence-final NP, demon-strating the influence of prosodic information on struc-tural prediction. However, it is possible that thedifferent looking patterns for the two structures reflecteda difference between the default looking pattern associ-ated with the canonical SVO structure and the disruptedpattern due to the non-default (i.e., marked) intonationpattern for the less frequent OVS structure. This impliesthat the difference may be due to an absence of SVO pre-diction with the OVS-type prosody rather than the pres-ence of OVS prediction itself. It therefore remains unclearwhether prosody can indeed drive a structural prediction(see also Snedeker & Trueswell, 2003).

The current study examined the influence of contrastiveintonation in predicting a syntactic structure. Previousstudies showed that contrastive pitch accents evoke a no-tion of contrast in a discourse context and facilitate theprocesses of identifying an upcoming referent in spontane-ous dialog (Ito, Jincho, Minai, Yamane, & Mazuka, 2012; Ito& Speer, 2008; Weber, Braun, & Crocker, 2006a). Our studyexamined the impact of the contrastive intonation placed

on the relative clause (henceforth RC, underlined below)on the processing of temporarily ambiguous RC sentencesin Japanese such as (1).

(1)

Otokonoko-ga sanrinsha-ni notteita onnanoko-o mitsumeta.Boy-NOM [tricycle had being riding] girl-ACC stared at

‘The boy stared at the girl who had been riding thetricycle.’

In Japanese, RCs precede lexical heads without anovert complementizer or any grammatical marking onthe verb within the RC (henceforth RC verb). This createsa local syntactic ambiguity: The sentence is ambiguousbetween the main clause (henceforth MC) and the RCstructure up to the RC verb. Due to the strong preferencefor the MC structure, people typically analyze the VP (san-rinsha-ni notteita. ‘had been riding the tricycle’) as part ofthe MC for which the sentence-initial NP (otokonoko-ga,‘boy’) is the subject (Inoue & Fodor, 1995; Mazuka & Itoh,1995).1 They are forced to revise the analysis for the RC onencountering the RC-head (onnanoko-o, ‘girl’).

In addition to prosody, we also manipulated visualcontext using the visual world eye-tracking technique(Cooper, 1974). It was designed either to support theuse of contrastive intonation (Contrastive context,Fig. 1b) or not to support it (Non-Contrastive context,Fig. 1a). Both types of context depicted four entities, threeof which corresponded to the referents in the sentence;subject (boy), RC object (tricycle), and RC-head (girl).The fourth entity in the Contrastive context stood as acontrast to the RC-head entity (another girl on a hobby-horse, Fig. 1b); in the Non-Contrastive context it was adistractor that did not stand as a contrast (an adult wo-man on a bicycle, Fig. 1a).

Page 3: Cognition 2012 Nakamura

(a) Sentence without contrastive intonation (b) Sentence with contrastive intonation

Fig. 2. F0 contour of the sentence (1) without contrastive intonation (a) and with it (b).

2 The difference in F0 peaks between the sentence-initial NP and thefollowing NP was significantly larger for the items with contrastiveintonation compared to those without (t(27) = 15.17, p < 0.001 by a pairedt-test). There was also a difference in the pause length prior to the RCbetween the items with contrastive intonation and those without (meandifference 52 ms; t(27) = 2.65, p < 0.05 by a paired t-test). This pause maypossibly affect syntactic analysis with this structure independently ofvisual context. As shown in the result section, however, we found noevidence to support this. There is another possible confound for ourprosodic manipulation: The pitch range is usually reset prior to a newclause in Japanese (cf. Uyeno, Hayashibe, Imai, Imagawa, & Kiritari, 1980;Venditti, 1994), which could also affect syntactic analysis independently.However, again, we observed no supporting evidence.

C. Nakamura et al. / Cognition 125 (2012) 317–323 319

Our manipulation of visual context is in essencesimilar to the study of Tanenhaus, Spivey-Knowlton,Eberhard, and Sedivy (1995), which showed that acontrastive context facilitated the processing of anambiguous postnominal modifier, a prepositional phrasefollowing a head noun (i.e., on the towel in ‘Put the appleon the towel in the box’). With our head-final construc-tion, it is possible that Contrastive context would drivelisteners’ expectation for a prenominal modifier even be-fore hearing the head. However, due to the strong pref-erence for the MC structure, contextual informationalone may not be able to activate the infrequent RC anal-ysis. Thus, we expected that contrastive intonation onthe RC would play a critical role: When a contrast inthe visual scene is highlighted by contrastive intonationon the RC (i.e., emphasizing that someone had been rid-ing a tricycle but not a hobbyhorse), the modifier inter-pretation (i.e., the RC analysis) may be accessed evenbefore the referent is mentioned as it tells listenerswhich girl is being referred to. Therefore, anticipatoryeye-movements toward the to-be-mentioned referent(i.e., the girl who is not on the hobbyhorse) would indi-cate the prediction of an RC-head and thus reflect the lis-teners’ RC analysis before the sentence wasdisambiguated because the alternative MC interpretation(i.e., ‘The boy had been riding the tricycle’) does not re-quire any further linguistic material following the verb.On the other hand, we expected that prosody wouldnot affect eye-movements in the Non-Contrastive contextbecause the prosodic cue in absence of a contrastive pairwould likely be interpreted as a simple emphasis of thedative NP in the default MC analysis (i.e., emphasizingthat the boy had ridden the tricycle but not otherthings).

Furthermore, we also investigated the influence of theprosodic cue following the disambiguating RC-head NP tosee whether structural prediction would affect the subse-quent cost at the disambiguating information. Such an ef-fect is predicted by processing models such as Hale’s(2001) surprisal model, which estimates processing costbased on the change in the probability distribution overpossible analyses from one word to the next (see also Levy,2008).

2. Experiment

2.1. Method

2.1.1. ParticipantsTwenty-eight native speakers of Japanese with normal

visual acuity and hearing participated in the experimentfor monetary compensation.

2.1.2. MaterialsTwenty-eight experimental items were created. Each

item consisted of an auditory sentence and a correspond-ing scene. The auditory stimuli were recorded by a femalenative speaker of Japanese with standard accent. Fig. 2shows the F0 contours of the sentence (1) without the con-trastive intonation (Fig. 2a) and with it (Fig. 2b).2

The visual scenes were prepared using commercial cli-part images. The position of objects was counter-balancedacross the pictures. Four experimental lists were createdfollowing a Latin square design including 56 fillers. The84 items in each list were presented in pseudo-random or-der with a constraint that at least two fillers preceded eachexperimental item. In addition, 12 comprehension ques-tions were included.

2.1.3. ProcedureParticipants were first given a brief instruction and

underwent a calibration procedure. They were told to lis-ten to auditory sentences carefully while paying attentionto the picture on the computer monitor. In each trial, anauditory sentence was presented 2500 ms after the picture

Page 4: Cognition 2012 Nakamura

Fig. 3. Probability of gazes to the RC-head entity from the RC verb onset to 1300 ms.

Table 1Analysis of looks to the RC-head entity from 100 ms to 800 ms followingthe RC verb onset.

b t p

Intercept �2.94visual context 0.07 1.04 0.30Prosody 0.13 1.88 0.06Prosody � visual context 0.15 2.27 <0.05

320 C. Nakamura et al. / Cognition 125 (2012) 317–323

onset. Participants’ eye-movements were recorded, whilethe picture was presented, with Eyelink Arm Mount (SRResearch) at the sampling rate of 500 Hz. The whole exper-imental session took approximately 30 min.

3 We calculated the empirical logit using the function g0 ¼ ln yþ0:5n�yþ0:5

� �

where y is the number of looks to the RC-head entity and n is thetotalnumber of looks to all the objects in a scene and background (Barr,2008).

3. Data analysis and results

The fixation coordinates from the eye tracker weremapped onto four entities in the visual scene and werethen converted to gazes. We manually marked the onsetand offset of the RC verb (notteita, ‘had been riding’),those of the RC-head (onnanoko, ‘girl’), and the onset ofthe case-marker for the RC-head (-o) in each target sen-tence. We first analyzed the gazes to the RC-head entityfor the duration of the RC verb. Fig. 3 shows the probabil-ity of gazes to the RC-head entity from the RC verb onsetuntil 1300 ms. The first vertical line marks the mean off-set of the verb (822 ms, SD = 133) and the second line(dotted) the mean onset of the RC-head (1289 ms,SD = 131).

For the analysis, we summed the gazes to each object inthe scene, which were sampled every 20 ms, for the700 ms time interval of 100–800 ms following the RC verbonset and calculated the logit of looks to the RC-head

entity out of looks to all the objects in a scene (includingbackground).3 We then conducted statistical analyses usingLinear Mixed Effects models (e.g., Baayen, Davidson, & Bates,2008). We included Prosody (with or without contrastiveintonation) and Visual Context (Contrastive or Non-Contras-tive) as fixed effects with the interaction between the twofactors allowed; participants and items were randomfactors. All the fixed factors were centered with deviationcoding. We checked whether the model improved its fit byadding random slopes for each participant and item with aforward-selection approach. Table 1 shows coefficients (b),t-values, and their p-values or significance level from themodel.

The results showed that there was a marginal effect ofProsody (p = 0.06), suggesting a tendency for participantsto look more at the RC-head entity with contrastive intona-tion than without. Most importantly, there was a signifi-cant interaction between Prosody and Visual Context.Separate analyses for each context type revealed that theeffect of Prosody was significant in the Contrastive context(b = 0.28, t = 2.86, p < 0.01) but not in the Non-Contrastivecontext (b = �0.03, t = 0.28, p = 0.78). This demonstratesthat participants looked more at the RC-head entity withcontrastive intonation only in the Contrastive context.Furthermore, separate analyses for each prosody patternrevealed a significant effect of Visual Context with contras-tive intonation (b = 0.22, t = 2.22, p < 0.05) but not without(b = 0.08, t = 0.91, p = 0.37). Since an additional analysis re-vealed no difference in the looks to the distractor/contras-tive entity in this time interval across conditions

Page 5: Cognition 2012 Nakamura

Fig. 4. Probability of gazes to the RC-head entity from the RC-head onset to 1700 ms.

4 Another possibility is that syntactic analysis was delayed until the MCverb was encountered. Although this view is consistent with a head-drivenparsing model (Pritchett, 1991), it is clear from Fig. 4 that gaze probabilitiesacross conditions started to diverge before the MC verb onset, suggestingthat analysis initiated prior to the verb.

Table 2Analysis of looks to the RC-head entity from 100 ms to 800 ms followingthe case-marker onset.

b t p

Intercept �0.06Visual context �0.43 �4.50 <0.001Prosody �0.02 �0.16 0.87Prosody � visual context �0.21 �2.20 <0.05

5 The best-fit model included a by-item random slope for Prosody. p-Values were computed using the likelihood-ratio (LR) tests.

C. Nakamura et al. / Cognition 125 (2012) 317–323 321

(b = �0.02, t = 0.99, p = 0.33 for Visual Context, b = �0.00,t = 0.08, p = 0.94 for Prosody, b = �0.02, t = 0.87, p = 0.39for their interaction), this provides evidence for an effectof visual context on the prediction of a syntactic structure,which has not been demonstrated previously. In previousstudies, the influence of referential context was observedonly after a head NP and a following modifier wereencountered but not in prediction (Spivey, Tanenhaus,Eberhard, & Sedivy, 2002; Tanenhaus et al., 1995; True-swell, Sekerina, Hill, & Logrip, 1999). It is important to notethat our finding cannot be an artifact due to the referentialambiguity for the RC-head entity in the Contrastive context(two girls in Fig. 1). If participants had difficulty in identi-fying the correct RC-head entity, there should have beenfewer looks to the RC-head in the Contrastive context thanin the Non-Contrastive context. Our results howevershowed the opposite pattern. It also does not explain theeffect of prosody in the Contrastive context as it occurredfor the identical visual context.

We next analyzed the gazes following the RC-head on-set to examine the influence of the prosodic cue at the dis-ambiguating information. Fig. 4 shows the probability ofgazes to the RC-head entity from the RC-head onset to1700 ms. The first vertical line marks the mean onset ofthe head noun case-marker (565 ms; SD = 183), and thesecond line that of the MC verb (719 ms; SD = 197).

Following the same procedure, we analyzed the logit oflooks to the RC-head entity, calculated using the same

function as in the earlier analysis, for the duration of theRC-head noun (100–600 ms interval following the onset).The results showed no effects or interaction of the two fac-tors (b = �0.01, t = 0.32, p = 0.76 for Visual Context,b = �0.01, t = 0.29, p = 0.81 for Prosody, b = �0.05, t = 1.46,p = 0.14 for their interaction). One possibility is that partic-ipants may have delayed structural (re)analysis until theyheard a case-marker as it informs the grammatical role ofthe RC-head NP in a matrix clause. This is consistent withprevious studies that showed that case-markers play a crit-ical role in pre-head syntactic analysis in Japanese (e.g.,Miyamoto, 2002).4 We therefore conducted another analy-sis on the logit of looks to the RC-head entity from 100 msto 800 ms following the case-marker onset. The 700 msinterval was selected for compatibility with the earlier anal-ysis for the RC verb duration. Table 2 summarizes theresults.

The results showed a main effect of Visual Context, sug-gesting that participants looked at the RC-head entity morein the Non-Contrastive context than in the Contrastivecontext. This likely reflects the fact that there was onlyone entity that matches the RC-head noun in the Non-Con-trastive context whereas there were two in the Contrastivecontext (i.e., two girls in the scene). This was supported byan additional analysis on the gazes to the distractor/con-trastive entity for this duration, which showed a main ef-fect of Visual Context (b = 0.72, t = 5.04, p < 0.001) butneither an effect of Prosody (b = 0.09, t = 1.15, p = 0.25)nor an interaction (b = �0.04, t = 0.52, p = 0.60).5 Mostimportantly, the analysis of looks to the RC-head entity for

Page 6: Cognition 2012 Nakamura

322 C. Nakamura et al. / Cognition 125 (2012) 317–323

this time window revealed an interaction between Prosodyand Visual Context. Separate analyses for each context typeshowed that neither of the simple effects of Prosody reachedsignificance although they showed trends in opposite direc-tions and the effect for the Contrastive context was some-what stronger (b = 0.20, t = 1.45, p = 0.16 for Non-Contrastive context; b = �0.23, t = 1.72, p = 0.08 for Contras-tive context).6 The coefficients suggest that in the Contras-tive context, participants looked less at the RC-head entitywith contrastive intonation than without, which inverselyreflected more anticipatory looks to the same entity withcontrastive intonation in the earlier time period. In contrast,they looked more at the RC-head entity with contrastiveintonation than without in the Non-Contrastive context.This likely indicates that participants interpreted the cueas a simple emphasis in this context, leading to a strongercommitment to the MC analysis in the earlier time period,and that they were more surprised to hear the disambiguat-ing information.

4. General discussion

Our study demonstrated that participants used contras-tive intonation to predict a syntactic structure when pro-cessing RC sentences in Japanese. Crucially, the influenceof the prosodic cue was observed only when the visualscene provided an appropriate context. In the Contrastivecontext, participants made more anticipatory eye-move-ments toward the RC-head entity immediately on hearingthe RC verb when the RC had contrastive intonation thanwhen it did not. This suggests that the RC analysis was ac-cessed because the manipulated prosodic cue was inter-preted in light of the appropriate visual context and notbecause the cue was marked and thus associated with aless preferred structural analysis (cf. Snedeker & Trueswell,2003; Weber et al., 2006b).

Furthermore, our results also showed a late influence ofthe prosodic cue after disambiguating information wasencountered. The pattern of results was the opposite ofwhat was observed in prediction for the Contrastive con-text: In the Contrastive context, participants looked lessat the RC-head entity when the RC had contrastive intona-tion than when it did not. This demonstrates that partici-pants experienced less difficulty at the head as the RCstructure was already anticipated. This indicates that theprobability of the RC analysis was inversely correlated withits processing difficulty and thus provides empirical sup-port for processing models that employ predictions to cal-culate processing cost (Hale, 2001; Levy, 2008). On theother hand, in the Non-Contrastive context, there weremore looks to the RC-head entity following the disambigu-ating head NP when the RC carried contrastive intonation

6 In contrast with the further analyses, Fig. 4 appears to show a largerdifference for the Non-contrastive context than for the Contrastive context.This possibly reflects the difference in temporal location for the effect ofProsody between the two visual contexts; it occurs earlier in the Non-contrastive context than in the Contrastive context. One possible reason isthat the eye-movements in the Contrastive context relate to the shift awayfrom the RC-head entity whereas those in the Non-contrastive contextrelate to the shift toward it (hence the former occurring later than thelatter).

than when it did not. This most likely reflects the strongercommitment to the MC analysis in the earlier time period;the prosodic cue in absence of a contrastive pair was ini-tially interpreted as a simple emphasis but not as contras-tive, which corroborated the main clause analysis, andparticipants experienced more difficulty at the disambigu-ating information, resulting in more looks to the headentity.

To conclude, the current study provides evidence for theinfluence of contrastive intonation in both predicting andintegrating the RC-head with the RC structure in Japanese.Our results demonstrate that listeners can use prosody incombination with visual context to make a structural pre-diction and also that such a prediction is related to the pro-cessing cost at the disambiguating information. Our studyalso provides the first evidence for pre-head structural pre-diction driven by prosodic and visual information in ahead-final language.

Acknowledgments

We are grateful to Yuki Hirose and Edson T. Miyamotofor valuable comments on an earlier version of this manu-script, to Akira Utsugi for his help on preparation of audi-tory stimuli, and to Kiwako Ito for her advice on hand-coding of auditory sentences. We also thank KatherineMessenger for carefully proof-reading the manuscript.

References

Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effectsmodeling with crossed random effects for subjects and items. Journalof Memory and Language, 59, 390–412.

Barr, D. J. (2008). Analyzing ‘visual world’ eyetracking data usingmultilevel logistic regression. Journal of Memory and Language, 59,457–474.

Cooper, R. (1974). The control of eye fixation by the meaning of spokenlanguage: A new methodology for the real-time investigation ofspeech perception, memory, and language processing. CognitivePsychology, 6, 84–107.

Inoue, A., & Fodor, J. D. (1995). Information-paced parsing of Japanese. InR. Mazuka & N. Nagai (Eds.), Japanese sentence processing. Hillsdale,NJ: Erlbaum.

Hale, J. (2001). A probabilistic Earley parser as a psycholinguistic model.In Proceedings of the second meeting of the north American chapter ofthe association for computational linguistics (pp. 159–166). Pittsburgh,USA.

Ito, K., & Speer, S. R. (2008). Anticipatory effect of intonation: Eyemovements during instructed visual search. Journal of Memory andLanguage, 58, 541–573.

Ito, K., Jincho, N., Minai, U., Yamane, N., & Mazuka, R. (2012). Intonationfacilitates contrast resolution: Evidence from Japanese adults and 6-year olds. Journal of Memory and Language, 66, 265–284.

Kamide, Y., Altmann, G. T. M., & Haywood, S. L. (2003). The time-course ofprediction in incremental sentence processing: Evidence fromanticipatory eye movements. Journal of Memory and Language, 49,133–156.

Kjelgaard, M., & Speer, S. (1999). Prosodic facilitation and interference inthe resolution of temporary syntactic closure ambiguity. Journal ofMemory and Language, 40, 153–194.

Levy, R. (2008). Expectation-based syntactic comprehension. Cognition,106, 1126–1177.

Marslen-Wilson, W. D., Tyler, L. K., Warren, P., & Lee, C. S. (1992). Prosodiceffects in minimal attachment. Quarterly Journal of ExperimentalPsychology, 45, 73–87.

Mazuka, R., & Itoh, K. (1995). Can Japanese speakers be led down thegarden path? In R. Mazuka & N. Nagai (Eds.), Japanese sentenceprocessing (pp. 295–329). Hillsdale, NJ: Erlbaum.

Miyamoto, E. T. (2002). Case markers as clause boundary inducers inJapanese. Journal of Psycholinguistic Research, 31, 307–347.

Page 7: Cognition 2012 Nakamura

C. Nakamura et al. / Cognition 125 (2012) 317–323 323

Pritchett, B. L. (1991). Head position and parsing ambiguity. Journal ofPsycholinguistic Research, 20, 251–270.

Schafer, A., Carter, J., Clifton, C., Jr., & Frazier, L. (1996). Focus in relativeclause construal. Language and Cognitive Processes, 11, 135–163.

Schafer, A., Speer, S., Warren, P., & White, S. (2000). Intonationaldisambiguation in sentence production and comprehension. Journalof Psycholinguistic Research, 29, 169–182.

Shirai, Y. (1998). Where the progressive and the resultative meet:Imperfective aspect in Japanese, Korean, Chinese and English.Studies in Language, 22, 661–692.

Snedeker, J., & Casserly, E. (2010). It is all relative? Effects of prosodicboundaries on the comprehension and production of attachmentambiguities. Language and Cognitive Processes, 25, 1234–1264.

Snedeker, J., & Trueswell, J. (2003). Using prosody to avoid ambiguity:Effects of speaker awareness and referential context. Journal ofMemory and Language, 48, 103–130.

Speer, S. R., Kjelgaard, M. M., & Dobroth, K. M. (1996). The influence ofprosodic structure on the resolution of temporary syntactic closureambiguities. Journal of Psycholinguistic Research, 25, 249–271.

Spivey, M. J., Tanenhaus, M. K., Eberhard, K. M., & Sedivy, J. C. (2002). Eyemovements and spoken language comprehension: Effects of visual

context on syntactic ambiguity resolution. Cognitive Psychology, 45,447–481.

Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C.(1995). Integration of visual and linguistic information in spokenlanguage comprehension. Science, 268, 1632–1634.

Trueswell, J. C., Sekerina, I., Hill, N. M., & Logrip, M. L. (1999). Thekindergarten-path effect: studying on-line sentence processing inyoung children. Cognition, 73, 89–134.

Uyeno, T., Hayashibe, H., Imai, K., Imagawa, H., & Kiritani, S. (1980).Comprehension of relative clause construction and pitch contours inJapanese. Annual Bulletin, Research Institute of Logopedics andPhoniatrics (University of Tokyo), 14, 225–236.

Venditti, J. (1994). The influence of syntax on prosodic structure inJapanese. OSU Working Papers in Linguistics, 44, 191–223.

Weber, A., Braun, B., & Crocker, M. W. (2006a). Finding referents in time:Eye-tracking evidence for the role of contrastive accents. Languageand Speech, 49, 367–392.

Weber, A., Grice, M., & Crocker, M. W. (2006b). The role of prosody in theinterpretation of structural ambiguities: A study of anticipatory eyemovements. Cognition, 99, 63–72.