Experimental study of Primary and Secondary Scalar...

CogMaster – ENS / EHESS / Université Paris Descartes

Laboratoire de Sciences Cognitives et Psycholinguistique

Institut Jean Nicod

Experimental study of Primary and Secondary

Scalar Implicatures

Anouk Dieuleveut

under the supervision of

Emmanuel Chemla and Benjamin Spector

June 6, 2015

Contents

Acknowledgments ..............................................................................................................................4

Originality statement ..........................................................................................................................5

Contribution statement ......................................................................................................................6

Abstract ..............................................................................................................................................7

1. Introduction: theoretical background on Scalar Implicatures .......................................................8

1.1. The two main accounts of Scalar Implicatures .....................................................................8

1.1.1. The Gricean account: distinguishing three levels of reading .........................................8

1.1.2. The Grammatical account .......................................................................................... 10

1.1.3. Comparison of the two accounts concerning the distinction between PSI and SSI ...... 11

1.2. The experimental study of SI as a way to inform theoretical debates ................................. 12

1.2.1. Previous experiments and methodological considerations ......................................... 12

1.2.2. Goal 1: a more fine-grained approach of SI ................................................................ 13

1.2.3. Goal 2: studying the processing patterns of PSI and SSI .............................................. 14

1.2.4. Related studies .......................................................................................................... 15

1.3. “Scalar diversity” ............................................................................................................... 17

1.3.1. Not just testing SOME: why study a broader range of implicatures? ........................... 17

1.3.2. Informing theoretical debates on three expressions ................................................... 17

1.3.3. Summing up ............................................................................................................... 19

2. Work done during the internship .............................................................................................. 20

3. Experiment 1a: existence of primary and secondary implicatures for different scalar items....... 21

3.1. Goal................................................................................................................................... 21

3.2. Method and materials ....................................................................................................... 21

Experimental items ................................................................................................................... 22

Procedure ................................................................................................................................. 25

3.3. Participants ....................................................................................................................... 26

3.4. Results ............................................................................................................................... 26

Analysis of responses ................................................................................................................ 26

Readings by subjects ................................................................................................................. 28

Response times (description) .................................................................................................... 29

4. Experiment 1b: control experiment ........................................................................................... 30

4.1. Goal................................................................................................................................... 30

4.3. Participants ....................................................................................................................... 30

4.4. Results ............................................................................................................................... 31

Analysis of responses ................................................................................................................ 31

Readings by subjects ................................................................................................................. 33

Response times ......................................................................................................................... 33

4.5. Conclusion ......................................................................................................................... 33

Discussion for Experiment 1a and 1b ............................................................................................. 34

5. Experiment 2: dual-task impact on primary and secondary implicatures.................................... 37

5.1. Goal................................................................................................................................... 37

Truth Value Judgment Task ....................................................................................................... 38

Letter Memory Task .................................................................................................................. 39

Procedure ................................................................................................................................. 40

5.3. Participants ....................................................................................................................... 41

5.4. Results ............................................................................................................................... 42

Letter Memory Task .................................................................................................................. 42

Truth value judgment task......................................................................................................... 43

5.5. Discussion ......................................................................................................................... 46

6. General discussion .................................................................................................................... 48

Appendices ....................................................................................................................................... 50

1. Abbreviations ........................................................................................................................ 50

2. Pilot Experiment .................................................................................................................... 51

3. Experiment 2: testing the influence of presenting the stronger alternative ............................ 53

4. Displayed cards depending on the condition (Experiment 1a, 1b and dual-task) .................... 56

5. Number of trials by condition (Experiments 1a, 1b and dual-task) ......................................... 57

6. Instructions (experiment 1a and 1b) ...................................................................................... 58

7. Instructions (dual task) .......................................................................................................... 59

8. Comparison between experiments ........................................................................................ 60

References ........................................................................................................................................ 61

Acknowledgments

First, I want to thank Benjamin Spector and Emmanuel Chemla. They were great supervisors and

advisors. It is a chance to work with such people.

I also really thank Alexandre Cremers for helping me with IBEX and statistical analysis, and Florian

Pellet for helping me with IBEX and launching the experiments on Mechanical Turk.

I also thank the members of the ghost Writing Group, with a special mention to Mora and Adriana.

Last, I thank Juliette for her relevant questions and remarks during the first semester, Dominique for

the corrections, Floriane and Iryna for their patience and constant good mood, and Aymeric for his

general support.

Originality statement

This work is original for three main reasons:

First, we implement a paradigm that enables to experimentally distinguish between three levels of

reading for standard scalar items (literal meaning, primary scalar implicature and secondary scalar

implicature), which had, to our knowledge, not been done until now: most experimental studies of SI

only oppose two levels of readings (literal meaning and scalar implicature). Establishing the existence

of primary and secondary scalar implicatures is a way to draw a link between linguistic theories and

experimental data, and to inform the current theoretical debate between the Gricean and the

Grammatical account of SI. In particular, our experimental design enables us to test the Gricean

account on a specific point of the theory: the role of the Competence Assumption in the “Epistemic

Step”.

Second, we use this paradigm to study three debated cases of Scalar Implicature: numeral

quantifiers, the plural morpheme and the modifier almost. We compare their behavior to the

standard case of SOME regarding the distinction between primary and secondary scalar implicatures.

There was until now nearly no experimental data concerning the modifier almost, and it had even

not been shown that the literal reading existed. Our results strongly support the SI account.

Numerals quantifiers have already been much studied: our results add to previous findings, revealing

a new kind of differences between numerals and standard scalar items. Last, the plural morpheme

has not been much investigated in the experimental literature: our result suggest that, as for

numerals, they differ from standard scalar items, at least regarding the distinction between primary

and secondary scalar implicatures.

Last, we study the effect of implementing a dual-task on primary and secondary implicatures.

Contribution statement

This internship was jointly supervised by Benjamin Spector and Emmanuel Chemla.

Here are the main contributors:

Choice and definition of the scientific issue: B. Spector, E. Chemla, A. Dieuleveut

Bibliographical review: A. Dieuleveut, B. Spector, E. Chemla

Development of the methodology:

- Pilot: idea: B. Spector, E. Chemla

Implementation on IBEX: A. Dieuleveut, A. Cremers (SubHtml)

- Improvements of the pilot, experiments 1a and 1b: A. Dieuleveut, E. Chemla, B. Spector

Implementation on IBEX : A. Dieuleveut

- Dual task experiment: idea: B. Spector, E. Chemla, based on a study of P. Marty

Implementation on IBEX: A. Dieuleveut, F. Pellet (feedback), A. Cremers (randomization)

- Testing participants (launching the experiments via Mechanical Turk): F. Pellet, A. Cremers

Data analysis: A. Dieuleveut, E. Chemla, A. Cremers

Interpretation of the results: A. Dieuleveut, B. Spector, E. Chemla

Writing of the thesis, tables and figures: A. Dieuleveut

Corrections of the thesis:

- orthography: D. Juffin

- content: B. Spector, E. Chemla

Abstract

When you hear a sentence such as “Some of the cards are hearts”, you tend to understand that “it is

not the case that all cards are hearts”, even if the sentence with SOME is logically true if all cards are

hearts. This kind of linguistic inferences is called a scalar implicature. To date, several accounts of this

phenomenon have been proposed, mainly concerned with the question of the nature of the

mechanism at stake.

This experimental work addresses the current theoretical debate between the Gricean and the

Grammatical accounts of scalar implicatures by studying the distinction between primary scalar

implicature and secondary scalar implicature. As proposed by Sauerland (2004), we can theoretically

distinguish three levels of understandings for a sentence such as “Some of the cards are hearts”: (i)

“the speaker believes that SOME – and possibly ALL - cards are heart ” (literal reading), (ii) “it is not

the case that the speaker believes that ALL cards are hearts” (primary reading) and (iii) “the speaker

believes that it is not the case that ALL cards are hearts” (secondary reading). According to the

Gricean account, these three readings are accessed incrementally, with the final step from primary to

secondary implicature (known as the “Epistemic Step”) depending on the hypothesis of the speaker

being “opinionated”, called the “Competence Assumption”.

Based on a paradigm manipulating the informational state of a fictional speaker, we show with three

web experiments that for the paradigmatic example of SOME, these three levels of readings can be

distinguished. Furthermore, we show that the computation of the secondary reading does not

depend solely on the Competence Assumption: it can be accessed even when the speaker is not fully

informed.

Three other debated cases of SI are tested and compared to SOME using the same paradigm:

ALMOST, NUMERALS and PLURAL. We show that the behavior of ALMOST is highly similar to SOME,

supporting an SI account of ALMOST. Regarding PLURAL and NUMERALS, only two levels of reading

could be distinguished, namely, the literal reading and the secondary reading, suggesting that the

mechanism underlying the computation may differ between these items and standard scalar items.

In order to test the processing properties of these three levels of reading, we further implemented a

dual-task version of the experiment, as a way to better inform the debate between the Gricean and

the Grammatical account, both being able to predict the existence of three distinct readings. This

part is not conclusive yet, but deserves deeper investigation.

1. Introduction: theoretical background on Scalar Implicatures

1.1. The two main accounts of Scalar Implicatures

1.1.1. The Gricean account: distinguishing three levels of reading

When you hear a sentence such as “Some of the cards are hearts”, you tend to understand it as

meaning that “Some but not all of the cards are hearts”, even if, logically speaking, the sentence is

true when all cards are hearts. This kind of linguistic inference is called a Scalar Implicature (which

we will abbreviate SI from now on1). SI are a particular case of conversational implicatures, a concept

introduced by Grice (1975): these are inferences that, instead of directly coming from the linguistic

meaning of a sentence, result from a pragmatic reasoning taking into account the communicative

intentions of the speaker.

The debate about SI has mostly centered on the question of how these inferences come about.

Today, the dominant view is the Gricean account (Grice, 1975; Horn, 1972; Gazdar, 1979; Spector,

2003; Sauerland, 2004; van Rooij & Schultz, 2004; Russell, 2006, among others). Crucially, the

computation of an SI involves the comparison of the sentence actually uttered with a minimally

different sentence, called its scalar alternative, that the speaker could have uttered in the same

situation.

More precisely, the Gricean account distinguishes the following steps in the computation of a SI:

When you hear a sentence (S) containing a scalar item, for example “Some of the cards are hearts”:

1. First, you compute the literal meaning of (S), written as follow:

[[S]]: “Some, and possibly all, cards are hearts. “

Applying the Gricean maxim of quality2, you understand that “the speaker believes that S is true”,

which we will write B (S). (B (X) stands for “the speaker believes that X”).

2. Then, you compare the uttered sentence (S) with an alternative to this sentence, (S’), i.e. a

sentence that the speaker could have chosen in such a situation. Alternatives are obtained by

1 The main abbreviations used in this work are presented in Appendix 1. 2 In his William James lectures, Grice proposed that conversation rested on a “principle of cooperation”, that could be characterized by several maxims of conversation, which the speakers are supposed to follow. Two of the four Gricean conversational maxims are relevant to our purpose here:

- Maxim of quality: say only what you know to be true.

- Maxim of quantity: be as informative as needed.

replacing, in the sentence, the scalar term by other expressions belonging to the same scale

– in our example, <SOME, ALL>.

(S’): “All of the cards are hearts”.

The adjective “scalar” comes from the role of these scales in the derivation. Importantly, the

members of a scale can be ordered according to their logical strength: a hearer can thus compare (S)

and its alternative (S’) in term of informativity. In our example, (S’) asymmetrically entails (S):

uttering (S’) would be more informative than (S).

3. Assuming that the speaker is cooperative and gives as much information as she can

(following the Gricean maxim of quantity), you can infer that it is not the case that she

believes that (S’) is true. Indeed, if she had held this belief, she would have used (S’) instead

of (S).

This strengthened meaning is called the Primary Scalar Implicature (PSI from now).

¬B (S'): “it is not the case that the speaker believes that all of the cards are hearts.”

4. According to most recent Neo-Gricean accounts (Sauerland, 2004; Spector, 2003, a.o.), you

can go further in the computation: assuming that the speaker is well-informed, you can infer

that the speaker believes (S’) to be false.

This strengthened meaning is called the Secondary Scalar Implicature (SSI).

B( ¬ S’) : “the speaker believes that it is not the case that all of the cards are hearts.”

The step from PSI to SSI is called the Epistemic Step (Sauerland, 2004)3. According to a strict version

of the account, it relies on the assumption that the speaker is well-informed (or “opinionated”),

called the Competence Assumption: the speaker is knowledgeable regarding the truth value of (S’)

(Geurts, 2009).

Formally, the Competence Assumption corresponds to B(S’) ∨ B(¬ S’).

Table 1 summarizes the three levels of reading distinguished in the standard Neo-Gricean account.

LIT Literal meaning B(S) The speaker believes that (S) is true.

PSI Primary Scalar Implicature B(S) ¬B(S’) The speaker believes that (S) is true and it is not

the case that the speaker believes that (S') is true.

SSI Secondary Scalar Implicature B(S) B(¬S’) The speaker believes that (S) is true and the

speaker believes that (S') is not true.

3 The distinction between PSI and SSI was first introduced by Sauerland (2004) in order to account for the case of

disjunction (scale <OR, AND>), that we will not explain in detail here.

Table1. Three levels of reading for scalar items.

Example: (S) = “Some of the cards are hearts.”

(S’) = “All of the cards are hearts.”

(S’) asymmetrically entails (S).

We have until now given the example of the scale <SOME, ALL>, but this reasoning can be applied to

many other scales, called Horn scales (Horn, 1972): other quantifiers (ex. <FEW, NONE>), connectives

(ex. <OR, AND>), numeral quantifiers (<ONE, TWO, THREE …>), verbs, adjectives, etc. The reasoning

can even be applied to contextual scales (Hirschberg, 1985), but we will not address this point here.

1.1.2. The Grammatical account

The Gricean account has recently been challenged by another approach, the Grammatical approach

(Chierchia 2004, 2006; Chierchia, Fox & Spector, 2008, 2009, 2012; Fox 2007; Landman, 1998).

According to this account, the mechanism that gives rise to SI is not pragmatic but grammatical in

nature. It relies on the application of a phonologically null grammatical operator, often written O

because its meaning is similar to the meaning of the word only. Crucially, O insertion is a syntactic

process.

Under this view, a sentence (S) as “Some of the cards are hearts” is structurally ambiguous between

two parses:

- Parse without the operator:

[[S]]: “Some, and possibly all, cards are hearts. “

One may further apply pragmatic mechanisms (maxim of quality) and obtain B ( [[S]] ), which

corresponds to the literal meaning under the Gricean approach.

- Parse with the operator o:

[[ o(S) ]]: “only some of the cards are hearts”.

This parse leads to B( [[o(S) ]]), which corresponds to the SSI:

B(o(S) ) = B(S ∧¬S') ( “only some” means that “some, but not all, cards are hearts”.)

The listener has to make a disambiguation choice between the two parses. The factors that play a

role in this choice are not clearly specified by the theory, but evidence that the speaker is

knowledgeable or opinionated is one of them (Fox, 2007).

Under this approach, the SSI is not derived from the PSI. The PSI involves a pragmatic reasoning

similar to the Gricean mechanism, and can be obtained after the disambiguation choice.

Other accounts have been proposed for SI (see Sauerland, 2012, for a recent summary), but we will

not present them in detail here. We will occasionally refer to the default theory, according to which

SI are generated automatically and then can be cancelled (Levinson, 2000). As this theory is not

supported by experimental results, we won’t develop it thoroughly here.

1.1.3. Comparison of the two accounts concerning the distinction between PSI and SSI

The two accounts presented so far agree on some facts, notably on the role of alternatives in the

derivation and on the existence of Horn scales, as well as certain linguistic properties of SI such as

cancelability (often considered as the hallmark of SI). The debate essentially bears on the division of

labor between semantics and pragmatics in the derivation of SI, more specifically on the nature of

the mechanism by which (S’) is negated. According to the Gricean account, this mechanism is

pragmatic in nature: it requires taking into account the intentions and beliefs of the speaker.

According to the Grammatical account, the mechanism is primarily grammatical, and you don’t

necessarily have to take into account the mental states of the speaker to compute a scalar

implicature.

In particular, these accounts differ on the status of the different readings we have distinguished, and

this is what we will focus on, as a way to help teasing apart the two theories4. The status of the three

readings is summed up in Figure 1.

(a) In the Neo-Gricean account of SI, the derivation of SI is incremental: you first compute LIT,

then you optionally derive PSI, and then you optionally derive SSI if the Competence

Assumption holds.

(b) In the grammatical account, PSI are derived at the end, after the decision of applying or not

the O operator. PSI are pragmatic and SSI are grammatical.

NB: Under the Gricean account, you can’t derive SSI if the speaker is not well-informed, because you

compute SI according to the mental state of the speaker.

Under the Grammatical account, it is possible to derive SSI as a possible reading of the sentence even

if the speaker is not a priori taken to be well-informed, although it would not be a preferred parse.

4 There are other important theoretical differences between the two accounts. In particular, under the Gricean view, implicatures are computed globally (at the level of the sentence) and depend on a general cognitive system, whereas under the Grammatical account, implicatures can be computed locally and depend on a specific cognitive system, grammar.

Figure 1: comparison between Gricean and Grammatical accounts for PSI and SSI.

(Simplified version of Chemla & Singh, 2013)

Neo-Gricean Account

Grammatical Account

Let (S) be a sentence containing a scalar item.

(1) Applying the maxim of quality, you obtain that the

speaker believes the literal meaning, B(S).

(2) Scalar alternatives (S’) are obtained by replacing the

scalar item by other members of the scale.

(3) Applying the maxim of quantity, we infer that

¬B(S’), otherwise the speaker would have uttered

(S’).

(4) Assuming that the speaker is well-informed

(Competence Assumption), we can strengthen the

meaning into B( ¬S’).

Let (S) be a sentence containing a scalar item.

(1) We have the choice to apply or not a

phonologically null operator, written O, which

has a meaning similar to only.

(1a) Parse without the operator: B(S)

(1b) Parse with the operator: B(o(S)) = B(S’)

(2) We can further obtain PSI by Gricean

reasoning.

1.2. The experimental study of SI as a way to inform theoretical debates

1.2.1. Previous experiments and methodological considerations

Experimental studies of SI have developed quite a lot in the last decade, and have proven to be a

useful tool to inform theoretical debates. Diverse methodologies have been used, both in adults –

with truth value judgment tasks, response-time studies (Bott & Noveck, 2004 ; Noveck & Posada,

2003), self-paced reading (Breheny, Katsos & Williams, 2006; Bergen & Grodner, 2012), mouse-

tracking (Tomlinson, Bott & Bailey, 2011), eye-tracking (Huang & Snedeker, 2009, Grodner, Klein,

Carbary & Tanenhaus, 2010), dual-task paradigms (De Neys & Schaeken, 2007, Dieussaert, Verkerk,

Gillard & Schaeken, 2011, Marty & Chemla, 2013; Marty, Chemla & Spector, 2013) -, and in children

(Smith, 1980; Noveck, 2001; Papafragou & Musolino, 2003, among others).

We will only present the ones that are relevant to our purpose here.

One of the first experiments assessing SI was conducted by Bott & Noveck (2004). Based on previous

paradigms implemented by Rips (1975) and Smith (1980), it consisted in a sentence-verification task:

participants were presented with under-informative sentences such as “Some elephants are

mammals” and had to indicate whether the sentence was true or false. Crucially, this kind of

sentence is true if you compute the literal meaning, and false if you compute the implicature. This

was a way to establish the existence of two distinct levels of reading.

1.2.2. Goal 1: a more fine-grained approach of SI

Many experimental studies of SI are based on a Truth Value Judgment Task paradigm similar to Bott

& Noveck, which enables to make a distinction between the literal meaning (LIT) and the implicated

meaning (SI). They contemplate SI as a whole, without distinguishing PSI and SSI.

Our main goal was to show experimentally that, following the theoretical literature, three levels of

reading (LIT, PSI and SSI) could be distinguished for standard scalar items. Following the paradigm of

Bott & Noveck, this meant finding a case where PSI and SSI gave rise to different answers, i.e. a case

in which the PSI was true, but not the SSI.

One way to achieve this is to manipulate the level of information of the speaker: in a context where

the speaker is ignorant about the truth value of the alternative, a sentence like “Some cards are

hearts” will be judged appropriate with a PSI reading, whereas with a SSI reading, it will be judged

inappropriate, because of the speaker not having enough information to say that.

We will come back in greater detail to the paradigm we implemented in Chapter III.

Distinguishing the SSI from the PSI is a way to test the Gricean account of SI, which clearly postulates

that the three readings exist and are derived incrementally. Our paradigm further enables to test the

role of the alleged Competence Assumption in the “Epistemic Step”. The underlying reasoning is close

to a reasoning proposed by Fox (2014) to test the involvement of the maxim of quantity in the

Gricean account. In a thought experiment, he considers a situation (a TV game show) where the

speaker is uncooperative and does not follow the maxim of quantity, i.e. does not give all the

information: according to a strict Gricean account, there should be no more SI in such a context, as

their access crucially relies on the maxim of quantity. The rationale underlying this paradigm is

interesting for us: blocking the Competence Assumption and seeing whether SSI still arise is a way to

test its role. If the strict version of the Neo-Gricean schema is correct, there should be no SSI when

the speaker is not well-informed.

1.2.3. Goal 2: studying the processing patterns of PSI and SSI

Importantly, both the Gricean and the Grammatical accounts can predict the bare existence of the

three readings: they really differ on the question of the relations between them (Chemla & Singh,

2013). The Gricean account clearly states that the SSI is derived from the PSI, in an incremental way,

whereas in the Grammatical account, there is an ambiguity between LIT and SSI, the PSI being

derived afterwards, with different mechanisms. Do empirical data support the latter or the former

view? This was the second question we wanted to answer.

It has been shown using diverse methodologies that there was a cost associated to SI computation. In

particular, the study of response times in classical truth value judgment tasks (Bott & Noveck, 2004,

De Neys & Schaeken, 2007; Posada & Noveck, 2003; Rips, 1975, a.o.) showed that in target

sentences, participants who judged the statements to be false were slower than those who judged

them to be true. In the same way, the rate of SI increased as a permitted response time did. That

suggests that SI is derived from LIT, in an incremental way, as in the Gricean account. This finding was

confirmed using other methodologies: dual-task paradigm, eye tracking studies, mouse tracking

studies: SI are derived with a delay (Bott, Bailey & Grodner, 2012).

As said before, these studies only distinguish SI and LIT. Our second goal was therefore to study the

processing properties of these different levels of reading, which was interesting in two respects: it

was first a way to inform the theoretical debate between the Gricean and the Grammatical account,

and it is also a way to have a better understanding of the exact nature of this cost observed for SI.

Comparing the processing patterns of PSI and SSI could be done using several methodologies. We

first aimed to study response times, but potentially because of technical reasons linked to the fact

that we ran online experiments, our results were too noisy. Hence, we turned to another paradigm: a

dual-task experiment, which we will present in detail in Chapter IV.

1.2.4. Related studies

To our knowledge, it has never been shown straightforwardly that three levels of reading existed: the

PSI and the SSI are almost always confounded in experimental studies. However, the distinction

between PSI and SSI is sometimes addressed in the experimental literature5, specifically in studies

that manipulate the level of information of the speaker.

In particular, using a self-paced reading paradigm, Bergen & Grodner (2012) indirectly address our

point. The aim of their study differs from ours: it is to show that the speaker’s knowledge influences

the SI computation. As they use the distinction between PSI and SSI (called, respectively, Weak SI and

Strong SI) and manipulate the level of information of the speaker, it is tightly linked to our work,

that’s why we shall briefly present their study.

They implement a self-paced paradigm based on the following reasoning. Each trial consists of three

sentences: a context sentence, a trigger sentence and a continuation sentence. The context sentence

enables to manipulate the level of information of the speaker (ex: Full-knowledge – “I meticulously

compiled the investment report”. vs Partial-knowledge – “I skimmed the investment report.”). The

Trigger Sentence is: “Some of the real estate investments lost money”. The Continuation Sentence is:

“The rest were successful despite the recent economic downturn. “

The dependent variable is the reading time of SOME and of THE REST. A longer reading time on

SOME indicates that you derive the implicature (it is harder), and a shorter reading time on THE REST

indicates that you have previously derived the strong scalar implicature (it is easier because you have

already accessed the referent, which is the “complement set”: in the example, “a set of investments

that did not lose money.”).

Testing the effect of the level of information of the speaker, this means that the reading times on

SOME should be longer (and, conversely, shorter on THE REST) in the Full-Knowledge condition than

in the Partial-Knowledge condition, indicating that you have derived the strong implicature. This is

indeed what they found.

5 Ignorance inferences: Although they are not our focus, another kind of inferences has also been discussed in the

literature: ignorance inference (also called uncertainty inferences) (Chemla & Singh, 2013; Hoschtein, Bale, Fox & Barner,

2014; Fox, 2014, among others). As they are closely related, we will briefly explain how they differ from our concern. These

inferences correspond to the reading “the speaker is ignorant about the truth value of the stronger alternative (S’)”

Importantly, Ignorance Inferences are not equivalent to PSI, even if they are occasionally collapsed in the literature:

formally, PSI correspond to B(S) ¬B(S’), whereas Ignorance Inferences correspond to B(S) ¬B(S’) ¬B(¬S’). Under the

Gricean account, Ignorance Inferences result from a strengthening mechanism from the PSI, when the Competence

Assumption cannot be made. However, it is perfectly possible that the meaning is not strengthened from the PSI.

This experiment is interesting in two respects. First, it shows that the knowledge state attributed to

the speaker influences the likelihood of deriving an implicature: more implicatures are derived in the

“full-knowledge” condition. Second, and more linked to our purpose, their paradigm may enable us

to establish the existence of SSI per se. Indeed, the decrease of reading time on THE REST is explained

by the fact that the participant already accessed the “complement set”, which may not the case if the

participant only derived the PSI but not the SSI. However, this can be questioned: deriving the PSI

may also get the participant closer to the belief that the “complement set” is not empty: the PSI

could also explain the difference found.

In any event, this paradigm does not It enable us to distinguish between three levels of readings. The

results do not show either that PSI exist: the difference in reading times can be explained opposing

LIT to SSI. It could be interesting to know whether there are cases in which there is a difference of

reading times on SOME, but not on THE REST - which could indicate an access to the PSI but not to

the SSI - but given that it is not the focus of their experiment, this question is not addressed by the

authors.

We will briefly present a second paradigm that has striking similarities with ours. In a study assessing

the computation of SI in autistic people, Hochstein & Barner (unpublished), use a Partial-Knowledge

task as a test of Epistemic Reasoning (Experiment 3). The principle of the paradigm is the following.

There are three boxes. The speaker can be either knowledgeable (knowing the content of all boxes)

or ignorant (knowing the content of only two out of the three boxes). A sentence is uttered:

“Some/Two/All of the boxes have strawberries”. The question asked is: “Do you think there are

strawberries in this box?” The case with the knowledgeable speaker enables to distinguish LIT (“I

don’t know” answers) and SI (“no” answers): this is the classical condition to test SI.

When the speaker is ignorant, the expected answer is “I don’t know”: subjects are not licensed to

make the implicature, as the speaker is not in a position to know whether all of the boxes have

strawberries. They find that in this last condition, autistic people tend to answer “no” for sentences

containing SOME, which they interpret as showing that they are more likely to compute SI in

incorrect contexts (i.e. without epistemic justification). However, their paradigm does not show that

autistic people have derived the SSI but only that they do not take into account the level of

information of the speaker when interpreting the sentence. Indeed, they do not test the ignorance

condition with “All” or “Only some”, which would enable to see if the incorrect answer is specific to

the SI computation, or if it is a general deficit in taking into account the speaker information level.

Their paradigm does not enable us either to distinguish between three different levels of reading.

The paradigm we shall present is very close to this one, but crucially differs on the question asked to

the participant.

1.3. “Scalar diversity”: to which linguistic expressions does the scalar enrichment

mechanism apply?

In the first part of this introduction, we have presented the debate between the Gricean and the

Grammatical accounts, which concerns the division of labor between grammar and pragmatics in SI

computation. An orthogonal question to this is the following: how general is this phenomenon?

Which linguistic expressions can be considered as scalar items?

Our third goal was to use our paradigm to inform this question, as a “diagnostic” tool for three

debated cases: NUMERALS, PLURAL, and ALMOST.

1.3.1. Not just testing SOME: why study a broader range of implicatures?

As pointed out by van Tiel & al (2014), the experimental literature on SI has mainly focused on the

example of SOME6. It is important not to limit our investigations to this paradigmatic case for at least

two reasons. First, studying other scalar items is a way to make our conclusions more generalizable

and to avoid the criticism that they could be explained by a specificity of the word SOME. Second,

experimental studies comparing the behavior of different scalar items have shown that they were

not equivalent in many respects: the rate of derivation can importantly differ from one scalar item to

another (van Tiel, 2014), and importantly, their processing characteristics have also been proved to

be different, as we will see below.

1.3.2. Informing theoretical debates on three expressions

As the case of SOME, often considered as the paradigmatic case of SI, has been extensively studied,

we will use it as a baseline. We are now going to present three cases that can be accounted for with

a theory of SI, but for which this approach is controversial.

1.3.2.1. Numerals quantifiers

There is an ambiguity for a sentence such as “n cards are hearts” (with, for example, n=2): it can

mean “at least two cards are hearts” or “exactly two cards are hearts.”. It has been proposed (Horn,

1972; see Spector, 2013, for a recent summary) to analyze numerals as a case of SI. Under this

account, the literal meaning is “at least n cards are hearts”. The scale to consider is <ONE, TWO,

6The disjunction (scale: <OR, AND>) has also been the target of many studies, but we shall not address this case here.

THREE, …>. (“At least n+1” asymmetrically entails “at least n”). You obtain the implicature, “exactly n

cards are hearts.”, by a strengthening mechanism akin to the one presented in 1.1.1.

The case of NUMERALS is one of the most studied cases after SOME. Actually, there is a great deal of

evidence suggesting that they behave differently. First, this has been shown studying the syntactic

distribution of the assumed “implicated” (“exactly”) reading, compared to SOME (Horn 1992;

Breheny, 2008). More recently, acquisition studies have shown that children acquired the NUMERALS

scale earlier than other scales: whereas children are known to acquire SI quite late, Papafragou &

Musolino (2003) found that 66% children aged of 5 y.o. accessed the strengthened reading for

NUMERALS, whereas only 12,5% did for SOME (see also Huang & Snedeker, 2009). Making the

assumption that this is explained by a problem in the access to scalar alternatives (see Barner &

Bachrach, 2010, Bale & Barner, 2013, a.o.), this result can be accounted for within a theory of

NUMERALS as SI, but it can also be taken to show that NUMERALS have an “exactly” lexical meaning.

Last, this conclusion was supported by processing studies using eye-tracking study (Huang and

Snedeker, 2009a) or dual-task paradigm: Marty & al. (2013) found that tapping memory resources

had opposite effects on SOME and on NUMERALS (under high cognitive load, participants derived

fewer SI for SOME, but more “exactly” readings for NUMERALS, suggesting that the “basic” meaning

of numerals is the “exactly” meaning).

1.3.2.2. Plural

The plural/singular distinction is the source of a long-standing debate in semantics. Intuitively, it

seems that the meaning of the plural morpheme is “strictly more than one” (Lasersohn, 1995).

However, some linguistic observations suggest that this definition is not sufficient: there are contexts

in which the plural morpheme can be interpreted as meaning “at least one”, e.g. under negation: a

sentence such as “There are no cards on the desk” will be considered false even if there is only one

card on the desk.

It has been proposed to treat the “strictly more than one” component of the meaning of the plural

morpheme as an implicature rather than an inherent part of its semantics (Sauerland, Anderssen &

Yatsushiro, 2005; Spector, 2007; Zweig 2008). According to this account, the literal meaning of the

plural morpheme is “at least 1”. Simplifying somewhat7, the scale to consider is <PLURAL,

SINGULAR> (SINGULAR meaning “exactly one”, thus asymmetrically entailing PLURAL). The

implicature is thus “at least 2”.

There are few experimental studies on this case. Pearson, Khan & Snedeker (2011) found that under

certain circumstances, it was possible to cancel the “more than 1” meaning component – one of the

7 To be more precise, the plural case can be viewed as a case of Higher Order Implicature (see Spector, 2007, for details).

hallmarks of implicatures -, even if this cancelation is quite difficult to obtain. This has also been

investigated in children (Tieu, Bill, Romoli & Crain, 2014).

1.3.2.3. Almost

Last, there is a debate on the semantics of the modifier ALMOST. Intuitively, the meaning of

“ALMOST X” (with for example X = ALL or X=NO) seems to be that “X is close to being true, but is in

fact not true”.

This “not X” part of the meaning of ALMOST has been analyzed as a case of SI (Sadock, 1981; Spector,

2014). The reasoning is similar to the reasoning for SOME: the literal meaning is “almost X and

possibly X”; the scale to consider is <ALMOST X, X> and the implicature is “almost X but not X”

(either the speaker would have said X).

However, this account of the meaning of ALMOST is far from being a consensus: in particular, the

“not X” part of the meaning has also be analyzed as an entailment (Hitzeman, 1992; Horn, 2011,

Kilbourn-Ceron, 2015).

Until now, there is no real experimental data on this question. Establishing the existence of the literal

reading for ALMOST would be a strong argument against the entailment view and supporting the SI

account.

1.3.3. Summing up

From a theoretical point of view, we can hold a consistent account of these three cases in term of SI.

One way to inform the theories is to confront them with the data: can we distinguish three levels of

reading – LIT, PSI and SSI – for NUMERALS, PLURAL and ALMOST? We aimed to investigate the extent

to which these scalar items revealed a similar behavior as SOME, considered as the paradigmatic case

of SI.

Importantly, the question of the differences between scalar items can be linked to the debate

concerning the nature of the mechanism that gives rise to SI: the mechanisms underlying these

three cases are not necessarily the same, and comparing their behavior is a way to inform the more

general debate presented in 1.1.

2. Work done during the internship

Before turning to the methodological part, here is a short summary of the work done during the

internship.

Our primary goal and guiding line was to find an experimental design enabling to establish the

existence of LIT, PSI and SSI for the most classical case of SI, SOME. This work was essentially

methodological. Two pilot experiments, presented in Appendix 2, enabled us to settle such a

paradigm. We then ran three sets of experiments. All were based on the same paradigm.

The first set of experiments consisted of two experiments.

The first one (Experiment 1a) is presented in Chapter III. It establishes the existence of three

readings: LIT, PSI and SSI, and further assesses the differences between scalar items: we show that

for SOME and ALMOST, we detect LIT, PSI and SSI, whereas for NUMERALS and PLURAL, we only

detect LIT and SSI. Chapter IV presents a control experiment we had to run afterwards (Experiment

1b), which replicated the results obtained in 1a.

Experiment 2, launched simultaneously with Experiment 1a, is presented in Appendix 3. It led to no

result, probably due to the number of cards presented (4 cards, whereas there were 8 cards in

Experiment 1). Its main goal was to see the effect of presenting the stronger alternative (sentences

containing ALL) on respectively the PSI and the SSI. The experiment was also partly driven by one of

the problems we had running the pilots: the rate of LIT was very high, which made it hard to

differentiate PSI and SSI; we thus explored different factors known to increase implicated readings,

presenting the stronger alternative being one of them. More details are given in appendix.

The last experiment is presented in Chapter V. It applies the dual-task methodology to the paradigm

established in Experiment 1. It had two main goals: first, studying the processing properties of the

three readings distinguished (a point we wanted to address in Experiment 1 studying Response

Times, but our experimental design was not well fitted to this purpose). The dual-task methodology

was interesting because it was more compatible with an online experiment, and it enabled us to have

more precise information on the processing. The second goal was to go further in the comparison

between scalar items: we wanted to see whether we replicated the differences obtained in

experiment 1 with a more fine -grained approach.

The chapter presents two experiments: Experiment 3a is a reduced version of the Experiment 1b,

and serves as baseline for Experiment 3b, which is the dual-task version of the experiment.

3. Experiment 1a: existence of primary and secondary

implicatures for different scalar items

3.1. Goal

The main goal of this experiment was to show that we could experimentally distinguish between

three readings for standard scalar items: the literal reading (LIT), the primary implicature (PSI) and

the secondary implicature (SSI). These three readings are summed up in table 1 below.

Our second goal was to show that the access to the SSI does not depend only on the Competence

Assumption: the SSI can be accessed even when the speaker is not fully informed.

Finally, we use this paradigm as a tool to investigate other cases that have been analyzed as SI:

NUMERALS, PLURAL and ALMOST. We compare their behavior to the behavior of SOME.

Table 1: Three readings (example of SOME)

LIT Literal reading B(some)

PSI Primary Scalar Implicature B(some) ¬ B(all)

SSI Secondary Scalar Implicature B(some) B (¬ all)

3.2. Method and materials

The experiment consisted in a truth-value judgment task. It was an online experiment, hosted on

Alex Drummond’s Ibex Farm. Participants were recruited via Mechanical Turk and were paid for their

participation.

Participants were presented with a picture constituted of two sets of eight cards. Each set of cards

represented the beliefs of a player, Peter or Mary, as shown in the example below:

We manipulated the information level of the players by putting some of the cards face-down. One of

the players was fully informed: he could see all of the cards. The other player was only partially

informed: some of his cards were presented face-down, with a question mark ? printed on it. It was

made clear that Mary and Peter were in front of the same cards but unequally informed.

A sentence was attributed to Mary or Peter: it was displayed on the right or on the left side of the

screen, depending on the speaker, in order to facilitate the matching between the sets of beliefs of

the speaker and the sentence. The speaker could thus be either fully informed (“Knowledgeable

Speaker”) or only partially informed (“Ignorant Speaker”).

Participants had to judge whether the speaker could or could not have said the sentence, according

to her informational state. Two answers were possible: “Mary/Peter can say that” or “Mary/Peter

cannot say that”. For the sake of simplicity, we will use “yes” and “no” to refer to these answers from

now on.

Importantly, with this type of judgment, as opposed to a bare true/false or yes/no judgment8, there

are two reasons for rejecting a sentence:

- the sentence is logically false (it does not match the actual world),

- the speaker does not have enough information to know whether the sentence is true or false9.

We presented both the cards seen by the speaker and the cards seen by another player in order to

control for the fact that the participant’s answer depended on the beliefs of the speaker, and not on

the actual situation. Thus, half of the information at each trial was in principle useless: the cards seen

by the other player were supposed not to influence the answer.

Experimental items

Each trial was composed of a sentence and a picture. We first describe the sentences tested, and

then the pictures associated with them.

Sentences

The sentence presented was always of the form “X cards are Y.”

X could be: SOME OF THE, TWO, SOME, ALMOST ALL, ALMOST NO, ALL, NO.

Y could be: heart(s), diamond(s), spades(s), club(s).

8 In pilot experiments, we first used another type of question: we directly asked “Can Mary/Peter say that?”. Participants had to choose between two (TRUE/FALSE) (pilot 1) or three possible answers (TRUE/FALSE/NOT ENOUGH INFORMATION) (pilot 2). The 2-answer version of the pilot was not sensible enough to detect SSI. The 3-answer version enabled us to detect SSI, but having three answers made the results less easy to interpret. See Appendix for details. 9 We could add to this a third reason: the sentence is true but the speaker believes that it is false. We have not used this

option.

Target sentences

Four scalar items were tested: SOME, NUMERALS, PLURAL and ALMOST.

The corresponding sentences were the following:

SOME “Some of the cards are [hearts].” 10

ALMOST “Almost all cards are [hearts].” ” Almost no card is a [heart].”

NUMERALS “Two cards are [hearts].”

PLURAL “Some cards are [hearts].” 11

Control sentences

Two types of sentences, which did not give rise to SI, were included as controls.

NO “No card is a [heart].”

ALL “All of the cards are [heart]s.”

Sentences with NO were included in order to check that the participants understood correctly the

meaning of the ? cards, and did not interpret them as representing another type of cards (“a card

which is not a heart”). The critical case was the following: when no heart is visible and some of the

cards are hidden, if the participant understands incorrectly the ? card, she will answer “yes” (“Peter

can say that”), whereas if she understands correctly the ?, she will answer “no” (“Peter cannot say

that.”). Participants who made more than 30% errors on these controls were removed from the

analysis. We also included control cases with NO that were clearly true and false.

Sentences with ALL were added to counterbalance the possible effect of NO on the rate of

implicatures for SOME.

Conditions and pictures

Conditions

The cards displayed with the sentence corresponded to four conditions: the picture could make true

(∅) no reading, (L) the literal reading only, (LP) the literal and the primary reading, or (LPS) the three

readings. We refer to these conditions using the initial of the readings(s) they make true.

10 SOME can combine with a Noun Phrase to form a partitive (“Some of the cards are hearts”) or a non-partitive construction (“Some cards are hearts”). We chose the partitive construction because it had been shown that it favored SI computation (Degen & Tanenhaus, 2011). 11 We have considered other sentences to test the PLURAL case:

(1) “There are hearts.”: With this sentence, it was not clear whether “hearts” referred to the card color or to the symbol on it: in target cases, participants could have answered “yes” because of seeing several symbols on the card.

(2) “There are cards which are hearts.”: This sentence was used in the pilot, but as it sounds quite unnatural to native speakers, we changed it.

∅ and LPS correspond to control conditions (∅: no controls ; LPS: yes controls); L and LP correspond

to target conditions.

- In L, the speaker is knowledgeable and knows that the stronger scalar alternative is true.

If the participant accesses LIT, she will answer “yes”.

If she accesses the PSI or the SSI, she will answer “no”.

This is the classical case used to test implicatures.

- In LP, the speaker is ignorant and does not know whether the stronger scalar alternative is true.

If the participant accesses LIT or PSI, she will answer “yes”.

If she accesses the SSI, she will answer “no”.

This new case enabled us to distinguish SSI from PSI.

The table below presents the correspondence between conditions and cards for SOME.

Condition

Control (no) Target Target Control (yes)

∅ L LP LPS

Speaker’s

Cards ♤ ♤ ♤ ♤ ♤ ♤ ♤ ♤

/ ♤ ♤ ♤ ♤ ? ? ? ? * ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ? ? ? ? ♥ ♥ ♥ ♥ ♤ ♤ ♤ ♤

The picture makes no

reading true The picture makes only

LIT true

The picture makes LIT and PSI true, but not

The picture makes the sentence true whatever

the reading

LIT NO YES YES YES

PSI NO NO YES YES

SSI NO NO NO YES

*: A sentence can be rejected (i) because the sentence is false and the speaker knows it or (ii) because the

speaker does not have enough information to say the sentence.

Table 2 presents the speaker’s cards for the four scalar items tested. We present only the target

conditions here: details are presented in Appendix 4.

SOME TWO PLURAL

ALMOST

ALMOST ALL ALMOST NO

L ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♤ ♤ ♤ ♤ ♥ ♤ ♤ ♤ ♤ ♤ ♤ ♤ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♤ ♤ ♤ ♤ ♤ ♤ ♤ ♤

LP ♥ ♥ ♥ ♥ ? ? ? ? ♥ ♥ ♤ ♤ ? ? ? ? ♥ ♤ ♤ ♤ ? ? ? ? ♥ ♥ ♥ ♥ ♥ ♥ ♥ ? ♤ ♤ ♤ ♤ ♤ ♤ ♤ ?

In the experiment, these four conditions were further divided into 7 conditions:

- ∅ condition was subdivided into:

o (1) Expected answer: “no” because the sentence is false and the speaker knows it.

o (2) Expected answer: “no” because the speaker does not have enough information,

(2a) the sentence is true according to the other player’s.

(2b) the sentence is false according to the other player’s.

- LP condition was subdivided into:

(3a) the sentence is true according to the cards of the other player.

(3b) the sentence is ambiguous according to the cards of the other player.

The comparison between (a) and (b) was a control to show that the cards of the other player did not

influence the answers.

Target cases represented 33% of trials for each type of sentence. There was the same number of

“yes” and “no” controls (see Appendix 5 for the detailed number of trials per condition).

For the control sentences with NO and ALL, as there were no SI associated, there were only 2

conditions (∅ and LPS). ∅ was subdivided into 3 conditions, following the principles explained above.

Pictures

The pictures consisted of a set of 16 cards (8 cards for each player). 8 of them really mattered for the

judgment. The color (spade/heart/diamond/club) and the rank (from 1 to King) of the cards were

chosen randomly using a program in Python. The color of the other cards was chosen to be easily

distinguishable from the color used in the sentence (if it was hearts, other cards could not be

diamonds; if it was spades, other cards could not be club). In half of the trials, Mary was the speaker;

in half of the trials, Peter was the speaker. Mary was always on the left side of the screen, and Peter

on the right side of the screen.

Procedure

The experiment was hosted on Alex Drummond's Ibex Farm. After having given their consent to

participate in the experiment, instructions were given to the participant (Detailed instructions and

training are given in Appendix 6). There was then a training (14 non ambiguous items with feedback),

and then the experimental phase (288 trials with no feedback). The four first sentences with no

feedback were taken from the training phase, in order to get the subjects used to the 'no-feedback'

version of the experiment, and were removed from the analysis. At the end of the experiment, there

was a short questionnaire, with information on age, sex, native language, kind of device used to

answer and Mechanical Turk Worker ID.

3.3. Participants

60 participants were recruited via Mechanical Turk. 59 of them completed the task. We removed

from the analysis 1 participant whose native language was not English, 5 that made more than 35%

(mean-2*standard deviation) errors on controls, and 1 who made more than 31% errors on NO-

controls. We thus present the results for 52 participants (36 females, 16 males, mean age: 41,8, from

20 to 62 year old).

3.4. Results

Data analyses were conducted using R. We used binomial linear mixed effects model, built with a

maximal random effect structure based on subjects and items as random variables, although we

sometimes had to step back to random-intercepts-only models when the model failed to converge

with the full random-effects specification (following Barr et al., 2013).

Analysis of responses

Data treatment: We removed the trials that were below 200ms and above 20000ms (less than 1%

of the data). We then removed, for each participants, the trials that were above and below mean+/-

2*standard deviation, keeping 94,9% of the data.

Globally, the mean of errors on controls was very low (All controls: 2,4%: LPS: 2,9%, ∅: 2,0%,).

Figure 1a shows the overall proportion of “yes” answers in the 4 conditions.

The mean of “yes” answers in L, corresponding to LIT alone, was 34,4%.

The mean of “no” answers in LP, corresponding to SSI alone, was 45,1%. (mean of “yes”:54,9%).

To obtain the rate of PSI, we compared the rates of answers in conditions L and LP, making the

assumption that the subjects were coherent in their readings. PSI thus represented 20,5% of the

readings.

We first checked that there was no significant difference between the different instantiations of the

condition ∅ and LP (as explained before, ∅ was instantiated 3 times: “no because the sentence is

logically false”; “no because the speaker does not have enough information, but the sentence is true”;

“no because the speaker does not have enough information and the sentence is true”; LP was

instantiated twice, the other player’s cards making the sentence true or ambiguous). This verification

showed that there was no difference between the two reasons why you could reject a sentence and

that the other player’s cards did not influence the answers. Indeed, there was no difference between

these conditions (see Figure 1b). In all following analyses, we made this verification, but we won’t

systematically report it.

Figure 1a – Proportion of “yes” answers (All scalar items)

4 conditions

Figure 1b– Proportion of “yes” answers (All scalar items)

Detailed conditions

Existence of the readings:

We used linear mixed model to predict answer (yes vs no), using Condition as fixed variable and

Subject as random factor. Three contrasts enabled us to distinguish between the three readings.

We first did the test for all scalar items, and then for each scalar item.

All scalar items confounded:

(1) First, we compared ∅ to L in order to detect LIT. There was a significant difference between

the rate of “yes” answers between the two cases (χ2 (1) =67, p<4.10-16 ***)

(2) We then compared L to LP in order to detect the PSI (χ2 (1) = 13, p<.001 ***)

(3) We finally compared LP to LPS in order to detect the SSI (χ2 (1) =70, p< 3.10-16 ***)

Effect of scalar item:

We ran the same three tests on each scalar item (see Figure 1c). The table below summarizes the

results.

∅ vs L - LIT L vs LP - PSI LP vs LPS - SSI

SOME χ2(1)=63.496, p=1.607e-15 *** χ2(1) =17.831 , p=2.414e-05 *** χ2(1)=8.1926, p=0.004206 **

ALMOST χ2(1)=10.968, p=0.0009269 *** χ2(1)= 21.554 , p=3.44e-06 *** χ2(1)=38.96 , p=4.326e-10 ***

TWO χ2(1)=7.0938 , p=0.007735 ** χ2(1) = 0.62, p=.4301 χ2(1)=43.612 , p= 4.003e-11 ***

PLURAL χ2(1) =13.949 , p=0.0001878 *** χ

2(1)=0.35, p= .557 χ

2(1)=57.998, p=2.624e-14 ***

For SOME and ALMOST, we detect the three levels of reading: the differences are significant for the

three tests. For TWO and PLURAL we detect LIT and SSI, but not PSI (TWO: χ2(1) = 0.62, p=.43 ; PLURAL:

χ2(1)=0.35, p= .56).

In order to check this difference of behavior between, on the one side, SOME and ALMOST, and on

the other side, TWO and PLURAL, we also tested the interactions between the different scalar items

for the PSI-test. Results are summed up below. As expected, the interaction is not significant in two

cases: SOME vs ALMOST and TWO vs PLURAL.

L vs LP ALMOST TWO PLURAL

SOME χ2(1)=0.0966, p=0.756 χ2(1)=32.34, p=1.291e-08 *** χ2(1)=39.547, p=3.20e-10 ***

ALMOST - χ2(1)=13.96, p=0.0001859 *** χ2(1)=18.11, p=2.085e-05 ***

TWO - - χ2(1)=1.1495, p=0.2836

Figure 1c – Proportion of “yes” answers by condition and scalar item

Readings by subjects

Figure 2 shows the responses of each subject combining the two target conditions L and LP, for each

scalar item tested. Each data point corresponds to a subject (a jitter was added in order to make the

results readable). Subjects that consistently access LIT are in the top right corner (they answer “yes”

in both cases). Subjects that derive the PSI but not the SSI are in the top left corner (they answer “no”

in LP and “yes” in L). Subjects that consistently access the SSI are in the bottom left corner (they

answer “no” in both cases). Crucially, we see that for SOME and ALMOST, there is nearly no subject

that answering “no” in LP but “yes” in L, whereas this is not the case for NUMERALS and PLURAL.

Figure 2 – Readings by subject

Response times (description)

Figure 3 shows the mean response time by condition. (We removed RT>10000ms).

Descriptively, it seems that our results replicate the

classical findings (Bott & Noveck, 2004): in L condition, the

mean response time for “no” answers is higher than for

“yes” answers (compared to the mean response times in

control conditions).

In LP, the pattern is reversed: answering “yes” takes

slightly more time than answering “no”. This does not

match the hypothesis of the Gricean account, which

predict a higher response time for “no” answers than for

“yes” answers. We will return to the possible explanations

for this result in the discussion.

Before turning to the discussion, we are going to present a control experiment we had to run for

experiment 1a.

Figure 3 – mean Response Times by

answer and condition

4. Experiment 1b: control experiment

4.1. Goal

A criticism could be made to our first experiment: all conditions presenting an ignorant speaker (ie

seeing ? cards) were either target condition (LP) or no-controls (∅). That meant that there was no

control condition in which the participant had to answer “yes” with an ignorant speaker12. In order to

control for this possible bias, we ran another experiment, perfectly identical to experiment 1, except

that we included a control condition where the subject had to answer “yes” even if the speaker was

ignorant.

The material was exactly the same as in experiment 1, except that we added a version of the LPS

condition (that corresponds to the “yes” control condition) in which the participant was ignorant.

(We will refer to it as “LPS-IGNO”, as opposed to “LPS-KN” for the previous version).

- For the sentences testing SOME and PLURAL, it corresponded to the cards ♥ ♥ ♥ ♤ ♤ ? ? ?

- For the sentences using TWO, ALMOST, ALL and NO, one needs to have all the information to

answer “yes”: it was therefore not possible to implement.

In order to increase the proportion of conditions in which the expected answer was “yes” despite

ignorance of the speaker, we also added sentences with FIRST and LAST (Example: “The first card is a

heart”, with ♥ ♥ ♤ ♤ ? ? ? ?). We will also refer to these conditions as “LPS-IGNO” condition. We also

added control trials corresponding to “LPS-KN”, “∅-IGNO”, and “∅-KN”.

Experiment 1b thus consisted in 320 trials. The proportion of controls (LPS and ∅) and target cases

was the same as in experiment 1. 20% of LPS were “LPS-IGNO”.

The procedure was exactly the same as in Experiment 1a.

4.3. Participants

60 participants were recruited via Mechanical Turk. 59 of them completed the task. We removed

from the analysis 2 participants whose native language was not English, 6 that made more than

46,3% (m-2sd) of errors on controls, and 6 who made more than 30,5% of errors on NO-control. We

thus preset the results for 45 participants (30 females, 15 males; mean age: 35,5, from 19 to 61 y.o.).

12 To be exact, it was the case in two of the examples given during the training phase. This seemed not strong

enough, however, to argue that the absence of a “LPS-IGNO” condition could not influence our results, creating a bias to answer “no” when the speaker was ignorant.

4.4. Results

Analysis of responses

Data treatment: We removed the trials that were below 200ms and above 20000ms (1,43% of the

data). We then removed, for each participants, the trials that were above and below m+/- 2sd,

keeping 94,8% of the data.

Figure 1a shows the proportion of “yes” in the 4 conditions, Figure 1b according to the detailed

conditions.

Figure 1a – Proportion of “yes” answers

(4 conditions)

Figure 1b– Proportion of “yes” answers

(detailed conditions)

As in experiment 1, the mean of errors on controls was very low (LPS: 5,4% (LPS-KN: 4,1 %, LPS-IGNO:

8,5%) ; ∅: 1,3% (∅-KN: 1,5 %, ∅-IGNO: 1,1 %)).

The mean rate of “yes” answers in L, corresponding to LIT alone, was 30 % (vs 34% in experiment 1a).

The mean rate of “no” answers in LP, corresponding to SSI alone, was 53%. (vs 45% in experiment

1a). PSI thus represented 23% of the readings.

As in experiment 1a, we first checked that there was no significant difference between the different

instantiations of the condition LP and ∅. In particular, we checked that there was no difference

between the two instantiations of the LPS condition (χ2(1)=0.64 , p=.42).

Existence of the readings

As in experiment 1, three tests enabled us to distinguish between the three readings. We first did the

test for all scalar items (see Figure 1a), and then for each scalar item (Figure 1c). The table below

summarizes the results.

∅ vs L - LIT L vs LP - PSI LP vs LPS - SSI

All quanti χ2(1)=54.806, p=1.33e-13 *** χ2(1)=17.744, p=2.528e-05 *** χ2(1)=62.792, p=2.298e-15 ***

SOME χ2(1)=20.301 , p=6.617e-06 *** χ

2(1)=21.292, p=3.943e-06 *** χ

2(1)=13.685 , p=0.0002162 ***

ALMOST χ2(1)=8.1686 , p=0.004262 ** χ2(1)=18.19, p=1.999e-05 *** χ2(1)= 44.808 , p=2.173e-11 ***

TWO χ2(1)=5.36 , p=0.0206 * χ

2(1)= 0.7503, p=0.3864 χ

2(1)=69.391 , p= < 2.2e-16 ***

PLURAL χ2(1)=18.122 , p=2.072e-05 *** χ2(1)= 2.1696, p=0.1408 χ2(1)=9 9.111 , p=0.002541 **

Figure 1c – Proportion of “yes” by condition and scalar item

We also tested the interactions between the different scalar items for the PSI-test: as in 1a, the

interaction is not significant in two cases (SOME vs ALMOST and TWO vs PLURAL).

L vs LP ALMOST TWO PLURAL

SOME χ2(1)=0.0878 , p=0.767 χ2(1)=19.18 , p=1.189e-05 *** χ2(1)=29.757, p=4.898e-08 ***

ALMOST - χ2(1)=7.9883 , p=0.004708 ** χ2(1)= 9.8767 , p=0.001674 **

TWO - - χ2(1)=8e-04 , p=0.977

Readings by subjects

Figure 2 shows the responses of each subject combining the two target conditions. As in experiment

1a, we see that for SOME and ALMOST, there is nearly no subject answering “yes” in L and answering

“no” in LP, whereas this is not the case for NUMERALS and PLURAL.

Response times

Figure 3a shows the mean response time by condition. Figure 3b shows the results for the eight

conditions. (We removed RT>10000ms). Descriptively, there is no difference between the mean

response times of “yes” and “no” answers in the L condition. However, when compared to the

controls, we find the same “cost” than in 1a: subjects are slower to answer “no” than “yes”.

Figure 2 – Readings by subject

Figure 3a – mean Response Times by answer

4.5. Conclusion

This control experiment showed that the presence of a “LPS-IGNO” condition did not influence or

results. First, there was no significant difference between the rate of correct answers in “LPS-KN” and

“LPS-IGNO”. Second, we still detect the SSI (53% of “no” in condition LP), which is even a higher rate

than in Experiment 1a (45%). Interestingly, Experiment 1b confirmed that Response Times were not

very reliable – or at least hard to analyze -: it was the only point on which we did not replicate

previous results.

Discussion for Experiment 1a and 1b

The aim of these experiments was twofold: first, we wanted to show that three levels of readings

existed, LIT, PSI and SSI, for standard scalar items (SOME); second, we wanted to compare the

behavior of three debated cases of Scalar Implicatures: ALMOST, NUMERALS and PLURAL.

Existence of the readings

For SOME, our results confirmed that three readings could be distinguished: LIT, PSI and SSI. This

means that for standard scalar items, the SI classically opposed to LIT in experimental studies can be

further decomposed into PSI and SSI, as predicted by the Neo-Gricean account of SI.

The existence of SSI per se had not been straightforwardly established previously (Chemla & Singh,

2013; for arguable attempts, see Grodner, unpublished). Or paradigm enales to dstngsh it from the

PSI: when the speaker is ignorant about the truth value of the stronger alternative, the sentence is

rejected if the participant accesses SSI (“the speaker believes that some but not all”), but accepted if

she accesses PSI or LIT.

Importantly, we show that the SSI can be accessed even when the speaker is presented as ignorant.

As explained in the Introduction, on standard Neo-Gricean accounts, the “Epistemic Step” (from PSI

to SSI) involves an assumption that the speaker is knowledgeable, called the “Competence

Assumption” (Sauerland, 2004). When this assumption is not warranted, i.e. when the listener knows

that the speaker does not have full knowledge, listeners should not compute SSI.

Let’s note that we controlled for the fact that participants may have judged the sentence according

to the actual situation (and not the beliefs of the speaker): even in cases where the sentence was

true according to the other player’s cards, participants rejected the sentence: this shows that they

accessed the SSI, and did not reject the sentence because it could be false.

Here, we show that participants can access the SSI reading whatever the information level of the

speaker. This does not mean that the level of information does not influence the likelihood of

deriving the SSI, as shown by Bergen & Grodner (2012), but challenges the strict Neo-Gricean

account on a specific point: the factors that play a role in the step from PSI to SSI. It is possible to

adapt the Gricean account, adding that the Epistemic Step does not rely (or, at least, not only) on the

Competence Assumption. Our results do not directly challenge the assumption that the computation

is incremental.

One could argue that our paradigm - a judgment on the appropriateness of a sentence - is not

ecological enough and does not represent a very naturalistic assessment of the computation of SI.

We tried to make the context not too remote from a “real life” situation (a card game with two

named players speaking), but it can still be argued that in “real life” situations, the SSI would never

be accessed in ignorant speaker conditions, because there is a qualitative difference between a real

human being and a fictional character.

The second new result is that PSI exist, and are distinct from SSI, as predicted by the Neo-Gricean

account. We do not have a condition in which only the PSI is true as for LIT or false as for SSI: the

result is based on the comparison between the rates of answer in two conditions, one in which the

speaker knows that the stronger alternative is true (L), and the other one in which the speaker is

ignorant regarding the truth value of the stronger alternative (LP). We make the assumption

(supported by previous studies) that subjects are coherent in their readings between the different

conditions, assumption that is further supported by looking at the readings by subjects.

One could argue that the effect is due to the effect of the speaker’s information level only: the

decrease of “no” answers between L and LP could be just due to the fact that you derive more SSI

when the speaker is knowledgeable. Let’s assume that PSI do not exist and that all “no” answers in

the L condition correspond to SSI readings. Two facts show that this criticism does not hold: first,

studying the pattern of answers by subjects, we see that nearly no subject derives the SSI but not the

PSI (almost all subjects answering “no” in the LP condition, i.e. accessing the SSI, also answer “no” in

the L condition), whereas there is a group of subjects accessing the PSI but not the SSI.

Second, and perhaps even more convincingly, the comparison of the behavior between scalar items

shows that the difference in the rate of “no” answers between the conditions L and LP cannot

depend solely on the level of information of the speaker: for NUMERALS and PLURAL, the difference

between L and LP in not significant. Let’s turn to this second issue.

Comparison between scalar items

Our second goal was to inform theoretical debates on three contentious cases that have been

analyzed in term of SI. Our result show that the relative proportions of LIT, PSI and SSI differ for

SOME, ALMOST, NUMERALS and PLURAL: whereas for SOME and ALMOST, there are three distinct

readings, for NUMERALS and PLURAL, there are only two distinct readings: LIT and SSI.

For ALMOST, our results strongly support an SI account (Sadock, 1981, Spector, 2012). First, we show

that a literal reading exist, which to our knowledge had never been established: a sentence such as

“ALMOST ALL cards are hearts” is accepted when all cards are hearts. This is challenging for an

account of the “not all” meaning component of ALMOST as a logical entailment (Hitzeman, 1992;

Horn, 2011; Kilbourn-Ceron, 2015). This conclusion is further supported by the striking similarities of

behavior between SOME and ALMOST: even if the rate of derivation differs (20% of LIT for ALMOST,

35% for SOME), which is not very surprising, given the study by van Tiel & al. (2013), we find the

same overall distribution of PSI and SSI. Let’s note that we tested two sentences for ALMOST

(ALMOST ALL and ALMOST NO): the same pattern shows up with the two sentences, the rate of

strengthened meaning being a little higher with ALMOST ALL than ALMOST NO.

Finally, the result for ALMOST strengthens the conclusion we draw for SOME, showing that the result

is likely to generalize to other items.

Regarding NUMERALS and PLURAL, we find a different pattern of answers, suggesting that there is

no PSI, or, at least, that PSI is less accessible. This suggests that the underlying mechanism is not the

same as for standard scalar items.

For NUMERALS, this new result adds to other differences already found between numerals and

standard scalar items, regarding syntactic distribution (Horn, 1992, Breheny, 2008), acquisition

(Papafragou & Musolino, 2003; Huang & Snedeker, 2009) and processing (Huang & Snedeker, 2009;

Marty & al., 2013), and challenges the traditional SI account proposed by Horn (1972).

For PLURAL, in the same way, our results suggest that the “strictly more than one” meaning

component is not obtained by the same mechanisms as standard scalar items. The parallel with

NUMERALS remains to be explored with other studies.

Processing properties

Still, both Gricean and Grammatical accounts can account for the bare existence of the readings, and

even for the observations by subjects. A way to inform the debate is to study their processing

properties.

In this experiment, we wanted to inform this question studying response times. Descriptively, it

seems that our results replicate the classical findings (Bott & Noveck, 2004): in L condition, there is a

delay associated with the computation of the implicature. In LP, however, the pattern is reversed:

answering “yes” takes slightly more time than answering “no”; but when compared to the control

conditions, there is no difference. This is not what is predicted by the Gricean account, where (if

anything) there should be a cost associated to the step from PSI to SSI. But this can be explained by

other factors: in particular, in LP, ? cards are displayed, which might make the answer “yes” harder to

Given that this was an online experiment and that we had not implemented the adequate controls

(e.g. counterbalancing for the position of “yes” and “no” answers on the screen), we decided not to

go further in this analysis, and to use another methodology to assess the processing cost associated

with the derivation of PSI and SSI: a dual-task experiment.

5. Experiment 2: dual-task impact on primary and secondary

implicatures

5.1. Goal

The main goal of this experiment was to study the processing properties of the three levels of

reading established in Experiment 1. This was also a way to have a better understanding of the “cost”

traditionally associated to SI.

As in Experiment 1, our second goal was to compare the behavior of four scalar items: SOME,

ALMOST, NUMERALS and PLURAL.

The dual-task methodology was interesting for us in at least two respects: first, it was more

compatible with an online experiment than the study of Response Time; second, it could bring more

precise information on the processing, indicating whether memory resources are involved.

As its name suggests, in a dual-task experiment, participants are asked to perform two tasks at the

same time. The reasoning underlying is based on the working memory model, which we will only

briefly explain here for reasons of space (but see Baddeley 1992, Miyake & Shah 1999, Engle, 2002).

Crucially, the paradigm relies on the assumption that human executive cognitive resources (i.e.

working memory resources) are limited: introducing a second task reduces the resources available for

the first task, facilitating automatic responses and inhibiting analytic responses (de Neys, 2006).

Studying SI, this could be a way to single out the “basic” meaning of a sentence, by blocking potential

strengthening mechanisms.

From a methodological point of view, the paradigm is based on the comparison between two

conditions of Cognitive Load, a factor characterizing the degree to which working memory resources

are burdened: a LOW-CL condition (with an easy second task), a HIGH-CL condition (with a harder

second task). We also added a NO-CL condition (baseline, without the second task).

Four studies have tested SI using a dual-task paradigm (De Neys & Schaeken, 2007, Dieuassert & al,

2011 ; Marty & Chemla, 2011; Marty & al., 2013). These four studies have shown that, for sentences

containing SOME, participants derive less SI as the second task became harder. This was a way to

confirm, in line with other studies using Response Time, self-paced reading or visual-world paradigm,

that SI were “costly” as compared to LIT, i.e. were not generated automatically as proposed by

Levinson (2000). Moreover, the dual-task methodology is, in a way, more precise than Response

Time studies: it indicates whether the cognitive effort associated with the processing involves central

working memory resources, whereas the conclusion we can draw from Response Time studies is that

SI are derived later than LIT, which does not characterize the nature of the resources involved.

In these four studies, PSI and SSI are confounded. Using our paradigm was also a way to understand

at which level of SI processing working memory resources were specifically involved.

The second goal of this experiment was, as in Experiment 1, to compare the behavior of four scalar

items. Interestingly, Marty & al (2013) found that the effect of the dual task was reversed for

NUMERALS and SOME: participants accessed more the “exactly n” reading (“implicated” reading)

under high cognitive load, whereas for SOME, they accessed more the “some and possibly all”

reading (“literal” reading). This result also drew us to choose this methodology. We used nearly

exactly the same paradigm, which allowed us to check that we could replicate - and then extend -

their results.

The experiment was an online experiment, hosted on Alex Drummond’s Ibex Farm. Participants were

recruited via Mechanical Turk and were paid for their participation.

Participants had to do two tasks at the same time:

- a truth value judgment task, identical to the task implemented in Experiment 1.

- a letter memory task, very similar to the task implemented by Marty & al (2013).

5.2.1. Truth Value Judgment Task

As in Experiment 1, each trial consisted of a picture and a sentence.

Sentences

The sentence was always of the form “X cards are [hearts]”.

As in experiment 1, four scalar items were tested:

SOME “Some of the cards are [hearts].”

TWO “Two cards are [hearts].”

PLURAL “Some cards are [hearts].”

ALMOST “Almost all cards are [hearts].” / “Almost no card is a [heart].”

We also included controls sentences with NO and ALL.

Pictures

Each picture was composed of two sets of eight cards. Each set of cards corresponded to the beliefs

of a player, Peter or Mary. We manipulated the information level of the players by putting some of

the cards face-down, with the symbol ? on them.

At each trial, a sentence was attributed to one of the players. The participant had to judge whether

the speaker could or could not have said the sentence given her informational state.

The cards displayed corresponded to four conditions, depending on the readings they made true:

Condition

Control Target Target Control

∅ L LP LPS

Speaker’s

♤ ♤ ♤ ♤ ♤ ♤ ♤ ♤

/ ♤ ♤ ♤ ♤ ? ? ? ? * ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ? ? ? ? ♥ ♥ ♥ ♥ ♤ ♤ ♤ ♤

LIT NO YES YES YES

PSI NO NO YES YES

SSI NO NO NO YES

∅ and LPS are control conditions; L and LP are target conditions. As in experiment 1b, they

corresponded to 8 actual conditions, depending on the cards of the other player.

5.2.2. Letter Memory Task

The memory task was a short term storage task of sequences of letters, based on the task

implemented by Marty & al.

Before the Truth Value Judgment Task, a sequence of letters was shown to the participants. The

letters were presented one after the other for 800ms, with 50ms pause between them. They were

displayed in the center of the screen, in black, in upper case.

The sequences were generated randomly using a program in Python. We used 9 letters: B, F, H, J, L,

M, Q, R and X (chosen to be phonologically distinct).

After the Truth Value Judgment Task, participants had to give back the sequence of letters in reverse

order.

They were given feedback at the end of the trial: either “Correct”, displayed in green in the center of

the screen, or “Wrong”, displayed in red, with an error message (ex. “You typed DL and the correct

answer was LF.”)

The Cognitive Load was manipulated by varying the length of the sequence of letters: 2 letters in the

LOW-CL and 4 letters in the HIGH-CL condition. Memory resources were supposed to be more heavily

taped in the HIGH condition. The cognitive load was manipulated within subjects: each participant

performed the LOW-CL as well as the HIGH-CL task. We also implemented a control experiment, with

no dual task, as a baseline.

Participants in the dual-task version of the experiment were administered two blocks of 92 trials,

with a short break between them: one block contained LOW-CL trials, and the other block contained

HIGH-CL trials. The order of the blocks was randomly determined for each participant. In each block,

the order of items and the correspondence between the sequence of letters and the truth-value

judgment task item was generated randomly.

Due to the difficulty of the dual task, we had to reduce the number of trials from Experiment 1b (320

trials). Given that our previous results suggested that the presence of a “LPS-IGNO” condition did not

strongly influenced the judgments, we removed sentences with FIRST and LAST, but kept the

subdivision of LPS into a KN and a IGNO condition. Moreover, in order to reduce the number of trials,

we removed half of the Ø and LPS conditions (i.e. control trials). The proportion of target cases in the

experiment was thus higher than in Experiment 1b: there was 25% ∅, 25% L, 25% LP, 25% LPS (i.e.

50% of target cases). There were 184 trials in total. (See Appendix 5 for the exact distribution of

conditions.)

5.2.3. Procedure

The experiment was an online experiment, hosted on Alex Drummond's Ibex Farm.

After having given their consent to participate in the experiment, participants were given instructions

concerning the Truth Value Judgment Task only. There was then a first training (4 non ambiguous

sentences with feedback). Then, participants were given the second part of the instructions,

concerning the Letter Memory Task. There was a second training with the Memory Task, on the 4

same sentences as before. The sequences were composed of 2 or 4 letters depending on the block

they started with.

The experimental phase was divided in two blocks, according to the cognitive load (LOW-CL vs HIGH-

CL). Participants were asked to make a short break between them. The two first sentences after the

break were taken from the training phase, in order to get the subjects used with the new number of

letters to memorize.

Each trial started with the presentation of the sequence of letters. Then, the Truth Value Judgment

task was displayed. It remained until the participant answered. Next, participants had to reproduce

the sequence of letters in reverse order. Last, they were given feedback on the accuracy of their

answer (see Figure 2).

At the end of the experiment, there was a short questionnaire (information on age, sex, native

language, kind of device used to answer, Mechanical Turk Worker ID, and kind of strategy used to

memorize the letters). This last question was included in order to control for the fact that

participants may have written the letters, as it was an online experiment.

For the no dual task version of the experiment, the procedure was exactly the same, except that the

instructions and the final questionnaire were adapted to the task.

5.3. Participants

59 participants were recruited via Mechanical Turk for the dual-task version of the experiment, and

61 for the no-dual task version of the experiment.

In the dual-task version, we had to remove 4 participants due to a problem loading their data, 3 who

made more than 25% of errors on NO-controls, and 1 who indicated that he had written the letters

for the memory task. All participants reported that English was their native language. In the no-dual

task version, we removed from the analysis 3 participants whose native language was not English, 2

that made more than 19% (m-2sd) of errors on controls, and 2 who made more than 25% of errors

on NO-controls. We thus had 51 participants for the dual task version of the experiment (24 LOW-

HIGH, 27 HIGH-LOW), and 54 participants for the no dual task version (58 females, 47 males, mean

age: 39,7, from 19 to 65 year old).

Figure 2: Description of the dual-task procedure

(4) Feedback on Letter Memory Task

(3) Letter Memory Task

( 2) Truth Value Judgment Task

(1) Presentation of the sequence of letters

(2 or 4 letters)

5.4. Results

Data analyses were conducted using R. We used binomial linear mixed effects model, built with a

maximal random effect structure based on subjects and items as random variables, although we

sometimes had to step back to random-intercepts-only models when the model failed to converge

with the full random-effects specification (following Barr et al., 2013).

5.4.1. Letter Memory Task

The mean rate of correct answers on the Memory Task was overall quite high (89%). As expected,

there was a significant effect of the Cognitive Load condition (see figure 3a): participants made more

errors on HIGH load trials than on LOW load trials (HIGH: 83,3% (SD:.03) vs LOW: 94,1 % (SD:.01)

(χ2(1)=46, p=1.178e-11***) . This confirmed that the 4-letters sequences were more demanding

than the 2-letters sequences.

There was no effect of the order of blocks (χ2 (1) = 0.13 , p= .72) (see Figure 1.)

Tradeoff analysis:

We further tested whether there was an effect of the answer on the Truth-Value Task on the

Memory Task results. Indeed, the cost associated with the SI computation can show up on the rate of

correct answers on the Memory Task. This corresponded to the following hypothesis; in condition L

and LP, there should be more errors if the participant previously answered “no” (i.e. the implicature

was derived) than if he previously answered “yes”; in contrast, in control conditions, the previous

answer should not affect the rate of errors.

Figure 1 – Proportion of correct answers

(Memory Task)

(by Cognitive Load condition and order of blocks)

Figure 2 – Proportion of correct answers

(Memory Task)

(by Condition and Answer on the TVJT)

The results were not significant whatever the Cognitive Load (LOW, HIGH or merged): there was no

effect of the answer given on TVJT (condition L, all scalar items confounded, all Cognitive Load

Condition: χ2 (1) = 0.3227, p = 0.57; condition LP: χ2 (1) = 0.3852, p= 0.5348). There was no effect

either when tested by scalar item. This means that there was no tradeoff between the two tasks.

5.4.2. Truth value judgment task

First, we checked whether we replicated the findings of Experiment 1 for the no-dual-task version of

the experiment (see Figure 3a):

NO DUAL-

TASK ∅ vs L - LIT L vs LP - PSI LP vs LPS - SSI

All items χ²(1)= 66.113, p=4.259e-16 *** χ²(1)= 32.435, p=1.233e-08 *** χ²(1)= 126.88, p=< 2.2e-16 ***

SOME χ²(1)= 15.333, p=9.012e-05 *** χ²(1)= 30.836, p=2.807e-08 *** χ²(1)= 12.692, p=0.0003671 ***

ALMOST χ²(1)= 11.826, p= 0.000584 *** χ²(1)= 32.23, p=1.369e-08 *** χ²(1)= 20.93, p=4.763e-06 ***

TWO χ²(1)= 10.087, p=0.001493 ** χ²(1)= 8.0943, p= 0.00444 ** χ²(1)= 33.47, p=7.239e-09 ***

PLURAL χ²(1)= 27.702, p=1.415e-07 *** χ²(1)= 1.4909, p= 0.2221 χ²(1)= 10.066, p=0.00151 **

We replicated the results of Experiment 1, except that for NUMERALS, the difference between L and

LP was now significant (χ²(1)=8.0943,p=0.00444**): we detected the PSI. The interaction between

TWO and PLURAL was significant (χ²(1)=4.2033, p=0.04035*), but it was also significant between

TWO and SOME (χ²(1)=27.134, p=1.899e-07***) and between TWO and ALMOST (χ²(1)= 15.80,

p=7.037e-05***). The interaction between SOME and ALMOST was, on the contrary, not significant

(χ²(1)=0.6958, p=0.4042).

Figure 3a –Truth value judgment task

(No dual-task experiment)

Figure 3b – Truth value judgment task

(Dual-task experiment)

Dual-task experiment:

In subsequent analyses, trials with incorrect answers on the Memory Task were removed (about

11,3% of the trials).

First, we conducted the same analysis on the dual-task version of the experiment (see Figure 3b):

DUAL-TASK ∅ vs L - LIT L vs LP - PSI LP vs LPS - SSI

All quanti χ²(1)= 53.132, p=3.119e-13 *** χ²(1)= 18.674, p=1.551e-05 *** χ²(1)= 45.93, p=1.226e-11 ***

SOME χ²(1)= 19.035, p=1.284e-05 *** χ²(1)= 18.253, p=1.934e-05 *** χ²(1)= 28.977, p=7.324e-08 ***

ALMOST χ²(1)= 8.7863, p= 0.003035 ** χ²(1)= 22.338, p=2.286e-06 *** χ²(1)= 9.943, p= 0.001615 **

TWO χ²(1)= 50.953, p=9.462e-13 *** χ²(1)= 3.4403, p= 0.06362 χ²(1)= 23.617, p=1.175e-06 ***

PLURAL χ²(1)= 55.567, p=9.033e-14 *** χ²(1)= 0.6822, p=0.4088 χ²(1)= 12.692, p=0.0003671 ***

We perfectly replicated findings of Experiment 1 regarding the existence of readings and the

difference between scalar items. Note that the result for TWO regarding the PSI is nearly significant.

Effect of Cognitive Load

Figure 4 shows the effect of Cognitive Load (comparing NO Cognitive Load (baseline), LOW Cognitive

Load and HIGH Cognitive Load) for each scalar item, depending on the condition.

For SOME and ALMOST, our first hypothesis was that the proportion of LIT would increase with

Cognitive Load, following results of previous studies. For NUMERALS, we expected the reverse

pattern, as found by Marty: the proportion of LIT would decrease when the Cognitive Load is higher.

We had no precise expectation concerning PLURAL.

Following the simplest Gricean account (assuming that the effect is mostly due to the retrieval and

manipulation of the alternative sentence), PSI and SSI should be impacted by the Cognitive Load in

the same way.

First, we tested the effect of having a dual task on the 3 readings, comparing the no-dual-task (NO

CL) to the dual-task version of the experiment (merging HIGH and LOW CL). We tested the interaction

between the condition and the version of the experiment for each scalar item:

NO CL vs CL ∅ vs L - LIT L vs LP - PSI LP vs LPS - SSI

SOME χ²(1)= 0.0195, p=0.889 χ²(1)= 2.7219, p=0.09898 χ²(1)= 0.4567, p=0.4992

ALMOST χ²(1)= 10.06, p=0.001515 ** χ²(1)= 0.1736, p=0.6769 χ²(1)= 0, p= 0.9969

TWO χ²(1)= 2.8795, p= 0.08971 χ²(1)= 0.0028, p=0.9581 χ²(1)= 0.365, p= 0.5457

PLURAL χ²(1)=0.0299, p= 0.8626 χ²(1)=0.1648, p=0.6848 χ²(1)= 6.2872, p= 0.01216 *

The effect of having a dual task is significant for ALMOST on LIT and for PLURAL on SSI. It is nearly

significant for TWO on LIT and for SOME for PSI (although before correction for multiple

comparisons). No general pattern emerges and all we can say is that the effect for ALMOST on LIT

goes in the same direction as the effect that was documented for SOME: there are more LIT readings

in the dual-task version of the experiment.

Figure 4 – Effect of Cognitive Load on the answer, depending on the scalar item

We then tested the effect of the level of Cognitive Load (HIGH vs LOW):

LOW CL vs

HIGH CL ∅ vs L - LIT L vs LP - PSI LP vs LPS - SSI

SOME χ²(1)= 3.044, p=0.08104 χ²(1)= 0.1062, p=0.7445 χ²(1)=0.0959, p=0.7568

ALMOST χ²(1)= 2.3602, p=0.1245 χ²(1)= 0.8089, p=0.3684 χ²(1)=0.0014, p=0.9702

TWO χ²(1)= 4.2628, p= 0.03896 * χ²(1)= 0.9796, p= 0.3223 χ²(1)=1.0443, p=0.3068

PLURAL χ²(1)=0.0729, p=0.7872 χ²(1)= 0.9437, p= 0.3313 χ²(1)=0.0333, p=0.8552

The results are not significant for PSI and SSI, whatever the scalar item tested. However, it turns out

to be significant for LIT for TWO (before correction for multiple comparisons though). Contrary to our

hypothesis and to what was obtained by Marty & al., this result corresponds to an increase of LIT for

NUMERALS. Moreover, the effect of Cognitive Load is reversed for SOME (even if it is not significant):

the rate of literal answers decreases when the Cognitive Load is higher.

5.5. Discussion

This experiment had two main goals: first, testing the effect of cognitive load on the different

readings established in Experiment 1, as a way to help localizing the “cost” observed for SI

computation in previous experiments; second, comparing the behavior of four scalar items, by

studying them on a new dimension.

As we did not replicate the findings of previous studies regarding the effect of Cognitive Load on the

computation of SI (De Neys & Schaeken 2007, Marty & Chemla 2011, Marty & al., 2013), and

statistical evidence being arguably weak (especially when “multiple comparisons” are taken into

account), few conclusions can be drawn from the experiment. Even in condition L, which corresponds

to the classical condition to test SI, the difference between HIGH and LOW-CL was not significant,

except for TWO. Unexpectedly, for TWO, this corresponded to more “literal” readings under higher

cognitive load, whereas the pattern was reversed – even if not significant - for ALMOST and SOME:

this is the opposite of what Marty & al. had found.

Comparing the comparison of the dual-task version to the baseline with no dual-task, the results go

in the expected direction, even if they overall turn out not to be significant (except for ALMOST):

whatever the scalar item, there is more LIT in the dual-task version of the experiment. Comparing a

dual-task and a no-dual-task experiment does not directly inform on the involvement of memory

resources, and this cost could be due, for example, to the effect of switching between two tasks

instead. It remains to be discussed how that type of interference could affect the derivation.

There are several explanations for the fact that we do not replicate previous results:

First, it is possible that our task was too easy: even if the results on the Letter Memory Task alone

showed that 4-letters-sequences were more demanding than 2-letters-sequnces, the overall rate of

correct answers on the Memory Task is quite high. The difference between the two levels of

Cognitive Load was perhaps not strong enough.

Second, contrary to Marty & al, we ran an online experiment, which meant that there were factors

we did not control for. Among others, participants may have written the letters (we removed the

participant who explicitly indicated that he used this strategy, but cannot be sure that other

participants did not: we regard this as being not very likely, but this is an example of what we cannot

physically control with online experiments). Another factor that differed between our experiment

and Marty & al.’s was that the linguistics task: they used a graded judgment task, possibly more

difficult than ours.

As we failed to replicate the result of previous studies, it was difficult to compare the cost for PSI and

SSI, which was the primary goal of the experiment. Nevertheless, our results strengthen the findings

of Experiment 1: we still detect three distinct readings for SOME and ALMOST, and only two for

PLURAL, even with a dual-task.

We might be worried by the fact that for NUMERALS, we detect the PSI in the no-dual-task version of

the experiment. This is not the case in the dual-task version, however it is nearly significant. The

comparison suggests that even if PSI exist for NUMERALS, it is hard to access. How can we explain

the fact that we detect PSI in this version of the experiment? It may be due to the fact that in this

version of the experiment, we had to modify the relative proportion of target and filler sentences: 1

out of 2 were fillers. As argued by De Neys and Schaeken (2007), adding more target sentences can

automatize the strengthening process: it offers more opportunities to come to a strengthened

interpretation, and repetition helps to make a process cognitively less demanding. If the PSI depends

on pragmatic mechanisms, this can account for the result: modifying the proportion of target can

make it more easily accessible. However, the same result is not obtained for PLURAL, suggesting that

this may not be the only factor involved.

6. General discussion

Summary of the results

Most experimental studies of SI are based on the distinction between the literal meaning (LIT) and

the implicated meaning (SI). In this work, we have established the existence of three levels of

readings, LIT, PSI and SSI, for standard scalar items as SOME.

This more fine-grained approach of SI enabled us to inform three specific debates, applying our

paradigm to three scalar items: ALMOST, NUMERALS and PLURAL. We compared their behavior to

SOME, considered as a “standard” scalar item, regarding the existence of three distinct readings. This

was a way to know to what extent they could be analyzed as cases of scalar implicatures.

Regarding ALMOST, we detected three distinct readings, as for SOME. This is a new result which

strongly supports the SI account of this expression over others. The result further strengthens the

conclusion for SOME, making it more generalizable. Extending this to other scalar items would be

highly interesting.

Regarding NUMERALS and PLURAL, we only distinguish LIT and SSI with this paradigm. For

NUMERALS, this difference adds to previous findings suggesting that NUMERALS depart from

standard cases of SI. Our result does not go directly against the idea that the “exactly” reading (the

assumed strengthened meaning) would not derive from the “at least” reading (the assumed literal

reading), but shows at least that the mechanisms underlying the computation are different for

NUMERALS and SOME. For PLURAL, the conclusion is similar: this deserves more investigation.

Importantly, our main results have been replicated across four experiments. There are small

variations in the rates of derivation across experiment (see Appendix 8 for a comparison between the

four experiments), but the overall pattern is robust. Factors accounting for the small variability are

diverse: subjects, other sentences tested in the experiment, relative proportion of fillers and targets,

Consequences for the theories

Let’s turn now to a more general and perhaps more central debate: the mechanism by which SI are

generated. Importantly, both the Gricean and the Grammatical accounts can predict the existence of

three levels of reading. Nonetheless, our results challenge the Gricean account on a specific point: we

show that SSI can be accessed even when the speaker is presented as ignorant, which suggest that

the “Epistemic Step” (from PSI to SSI) (Sauerland, 2004; van Rooij & Schultz, 2004) does not depend

solely on the Competence Assumption. According to the Gricean account, SI computation is mostly

pragmatic in nature, i.e. rests on a reasoning that takes into account the mental states of the speaker

– which includes, among others, her informational state -: it is difficult to understand why this SSI

could be derived when the speaker cannot be assumed to have an opinion on the truth value of the

stronger alternative.

Opened questions

A way to better inform the debate between these two accounts and to strengthen our conclusions

will be to study the processing properties of the readings, an important work that remains to be

done. Indeed, establishing their psychological reality does not directly enable to understand how

these three readings are related. Orthogonally, having a more fine grained approach of SI

(distinguishing PSI and SSI) will help to better understand the “cost” associated to the computation of

SI. One way to achieve this could be to study response times in a more controlled way, using a

training, as in the paradigm implemented by Bott & Noveck (2004) or Cremers & Chemla (to appear).

To conclude, even if of the guiding lines of our work was to inform the current debate between the

Gricean and the Grammatical accounts, our results have an interest beyond theoretical debates: they

put constrains on what these theories should or should not predict.

Appendices

1. Abbreviations

These are the main abbreviations used:

Readings

SI Scalar Implicatures

LIT Literal Reading

PSI Primary Scalar Implicature

SSI Secondary Scalar Implicature

(S) a sentence containing a scalar item. Ex: “Some cards are hearts.”

(S’) the stronger alternative to this sentence. Ex: “All cards are hearts.”

B ( x ) The speaker believes that (x)

Names of conditions:

with 4 conditions

∅ the picture makes no reading true

L the picture makes LIT true

LP the picture makes LIT and PSI true but not SSI

LPS the picture makes LIT, PSI and SSI true

with detailed conditions

Controls conditions

∅-KN Expected answer: “no”. The speaker is fully informed.

∅-IGNO-t Expected answer: “no”. The speaker is partially informed. The sentence is actually true.

∅-IGNO-f Expected answer: “no”. The speaker is partially informed. The sentence is actually false.

LPS-KN Expected answer: “yes”. The speaker is fully informed.

LPS-IGNO Expected answer: “yes”. The speaker is partially informed.

Targets conditions

L Expected answer: “yes” if LIT, “no” if PSI or SSI

LP–t Expected answer: “yes” if LIT or PSI, “no” if SSI. The sentence is actually true.

LP-a Expected answer: “yes” if LIT or PSI, “no” if SSI. The sentence is actually ambiguous.

2. Pilot Experiment

Method and material

The experiment was a truth-value judgment task. At each trial, four cards and a sentence were

presented to the participant. We manipulated the level of information of the speaker by putting

some of the cards face-down. The question asked was: “Can Peter say that?”.

We tested two versions of the experiment:

(a) with two possible answers: TRUE / FALSE

(b) with three possible answers: TRUE / FALSE / NOT ENOUGH INFORMATION

Sentences

Three scalar items were tested: SOME “Some of the cards are hearts.”

NUMERALS “Two/ Three cards are hearts.”

PLURAL “There are cards which are hearts.”

Conditions

Four conditions were tested: ∅: the sentence is false whatever the reading

L: the sentence is true with LIT only

LP: the sentence is true with LIT and PSI but not with SSI

LPS: the sentence is true whatever the reading

∅ and LPS corresponded to control conditions, L and LP to target conditions.

Conditions are named after the initial of the reading(s) the picture makes true.

The expected answers for the conditions are summed up in table 1 below.

Example: Mary: “Some of the cards are hearts.”

Conditions

Control conditions Target conditions

∅ (i) ∅ (ii) LPS L LP

Cards displayed

♤ ♤ ♤ ♤ ♤ ♤ ? ? ♥ ♥ ♤ ♤ ♥ ♥ ♥ ♥ ♥ ♥ ? ?

LIT FALSE NOT ENOUGH

INFORMATION TRUE TRUE TRUE

PSI FALSE NOT ENOUGH

INFORMATION TRUE FALSE TRUE

SSI FALSE NOT ENOUGH

INFORMATION TRUE FALSE

NOT ENOUGH

INFORMATION

NB: the ∅ control condition is divided into two sub-conditions depending on the reason why the

sentence can be rejected: (i) because the sentence is false or (ii) because the speaker does not have

enough information to say that. In the 2-answers version of the pilot, the expected answer for both

(i) and (ii) was FALSE ; in the 3-answers version of the pilot, the expected answer for (i) and (ii) was

respectively FALSE and NOT ENOUGH INFORMATION.

Table 2 below presents the cards displayed in each condition depending on the scalar item tested:

Conditions Control conditions Target conditions

SOME ♤ ♤ ♤ ♤ ♤ ♤ ? ? ♥ ♥ ♤ ♤

/ ♥ ♥ ♥ ♤ ♥ ♥ ♥ ♥

♥ ♥ ? ? / ♥ ♥ ♥ ?

TWO ♤ ♤ ♤ ♤ ♤ ♤ ? ? ♥ ♥ ♤ ♤

♥ ♥ ♥ ♥

/ ♥ ♥ ♥ ♤

/ ♥ ♥ ♥ ? ♥ ♥ ? ?

THREE ♤ ♤ ♤ ♤ ♤ ? ? ? ♥ ♥ ♥ ♤ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ?

PLURAL ♤ ♤ ♤ ♤ ♤ ♤ ? ? ♥ ♥ ♤ ♤ ♥ ♤ ♤ ♤ ♥ ? ? ?

* This case enabled to see whether the likelihood of the sentence with ALL being

true could increase the rate of derivation of the implicature (when 1 card is

hidden, it is 25% likely that All cards are hearts ; when 2 cards are hidden, it is

6,25% likely that All cards are hearts). Results showed that this had no effect.

Results

31 participants were recruited via Mechanical Turk for the 3-answers version of the experiment, and

28 for the 2-answers version. Results for the target conditions are presented below.

In the 2-answers version, we failed to detect SSI (there was no answer “false” in condition LP).

In the 3-answers version, we detected SSI (answers “not enough information” in condition LP).

Overall, the rate of LIT was high (“yes” answers in condition L). That made it difficult to really

differentiate PSI from SSI. We thus tried, in the improvements of the pilot, to find ways to increase

the rate of implicated readings (ie to diminish literal readings). We finally found that a critical factor

was the number of cards presented: with eight cards, there are more SI and less LIT readings.

SOME NUMERALS PLURAL

♥: card of the target color

♤: card of another color

?: ignorance card

3. Experiment 2: testing the influence of presenting the stronger alternative

One of the problem of our pilot was that the rate of LIT that was quite high (i.e. SI quite low), which

made it difficult to really differentiate PSI from SSI.

In particular, one factor known to increase the rate of SI is to present sentences containing the

stronger alternative (i.e. testing the scalar item SOME, sentences with ALL). This has been shown by

Grodner (unpublished): in a truth value judgment task similar to Bott & Noveck (2004), he showed

that increasing the proportion of ALL (vs NO) in fillers was monotonically related to generating the SI.

Besides, the real goal of the experiment was to test whether this manipulation impacted in the

same way the rate of PSI and SSI. According to the Gricean account, the generation of the

alternative is involved at the first step of the computation, between LIT and PSI. The manipulation of

the stronger alternative, whatever the exact mechanism involved, impacts the ease with which

alternatives are accessed. According to the Gricean schema, this manipulation should therefore

affect the rate of PSI only, and not the rate of SSI.

Results: As the rate of LIT was too high, we did not distinguished between PSI and SSI, thus the effect

of presenting the alternative was uninterpretable. We found afterwards that it was likely linked, as in

the pilot, to the number of cards presented (4 cards, whereas there were 8 cards in experiment 1).

Method and materials

Except for the number of cards and the sentences tested, the paradigm was exactly the same than in

experiment 1. Only SOME was tested.

Sentences

The sentence was always of the form “X cards are [hearts]”.

X could be: “some”, “all”, “some or all”, “no”.

In experiment 2a, we tested only sentences containing SOME (60 sentences).

There were also control sentences with N in order to check that the participants understood

correctly the meaning of the ? cards (12 sentences).

In experiment 2b, we presented sentences containing SOME (60 sentences) and sentences

containing its scalar alternative ALL (60 sentences).

There were also control sentences with NO (12 sentences).

In experiment 2c, we presented sentences containing SOME (60 sentences) and sentences containing

its SOME OR ALL13 (60 sentences).

There were also control sentences with NO (12 sentences).

13 The hypothesis concerning the effect of presenting SOME OR ALL was the same as for ALL: it was supposed to increase

only the PSI readings. From a theoretical point of view, the expression SOME OR ALL forces the hearer to derive the

implicature for SOME. Indeed, according to Hurford’s constraint, A or B is infelicitous when B entails A (ex. # “Mary saw an

animal or a dog”). In the case of SOME or ALL (A=SOME and B=ALL), unless you derive the implicature for SOME, ALL entails

SOME. A hearer is thus forced to understand SOME as meaning SOME BUT NOT ALL in this context.

This should increase the rate of SI derived for sentences with SOME in the experiment.

We were also interested in the result for sentences containing SOME OR ALL: it enabled to see if the implicature was

derived for OR.

Pictures

The picture was composed of two sets of four cards,

as shown in the example on the right.

Conditions

The cards displayed corresponded to four conditions, depending on the readings they made true. As

in experiment 1, ∅ and LPS were control conditions; L and LP were target conditions.

The procedure was the same as the one we presented in experiment 1. Only the number of trials in

the experimental phase differed (72 for 2a, 132 for 2b and 2c).

Control conditions Target conditions

SOME ♤ ♤ ♤ ♤ ♤ ♤ ? ? ♥ ♥ ♤ ♤ ♥ ♥ ♥ ♥ ♥ ♥ ? ?

ALL ♤ ♤ ♤ ♤

/ ♥ ♥ ♤ ♤ ♥ ♥ ? ? ♥ ♥ ♥ ♥ - -

SOME OR

ALL ♤ ♤ ♤ ♤ ♤ ♤ ? ? ♥ ♥ ? ? ♥ ♥ ♥ ♥ ♥ ♥ ♤ ♤

NO ♥ ♥ ♤ ♤ ♤ ♤ ? ? ♤ ♤ ♤ ♤ - -

Participants

180 participants were recruited via Mechanical Turk. 178 completed the task. We removed from the

analysis 6 participants whose native language was not English, 12 that made too many errors on ∅

and LPS controls and 4 who made more too many errors on NO-controls. We thus had 156

participants (2a: 52; 2b: 55; 2c: 50; 83 males, 73 females, mean age: 37,4, from 19 to 74 year old).

Results

Figure below shows the proportion of “yes” answers for SOME depending on the experiment.

As we see on the graph, we hardly detect the SSI. We conducted the same tests as in experiment 1.

The table below summarizes the results.

∅ vs L L vs LP LP vs LPS

2a χ²(1)=93.77, p< 2.2e-16 *** χ²(1)=48.205, p=3.838e-12 *** χ²(1)=21.659, p=3.257e-06 ***

2b χ²(1)=54.851, p=1.3e-13 *** χ²(1)=0.477, p=0.4898 χ²(1)=4.496, p=0.03397 *

2c χ²(1)=113.63, p=2.2e-16 *** χ²(1)=5.6503, p=0.01745 * χ²(1)=3.0859, p=0.07898

As opposed to the results of experiment 1, we don’t detect the three readings in experiment 2b and

2c. In order to compare the effect of presenting ALL sentences, we tested the interaction between

experiment (2a/2b/2c) and condition. Results were not significant, as shown below.

∅ vs L L vs LP LP vs LPS

2a vs 2b χ²(1)= 0.1172 , p=0.7321 χ²(1)=0.4408, p=0.5068 χ²(1)= 0.1172 , p=0.7321

2a vs 2c χ²(1)= 0.2885, p=0.5912 χ²(1)= 0.1362, p=0.7121 χ²(1)= 0.0604 , p=0.8059

2b vs 2c χ²(1)= 0.2498, p=0.6172 χ²(1)= 0.09, p= 0.3876 χ²(1)= 0.2579, p=0.6116

Conclusion

We believe that our results can be explained by the number of card presented (4 cards, as in the

pilot, vs 8 cards in experiment 1). Other factors could be involved, for example the environment

(effect of testing other scalar items in experiment 1).

It would be highly interesting to explore further this question and to run this experiment with 8 cards

(we did not because we were more interested in the processing of the readings). Indeed, it would

enable to test another prediction of the Gricean account, namely that the alternatives are involved

during the step from LIT to PSI (which presupposes that there exists a step from LIT to PSI).

Let’s note that another modification should be made: keeping the overall number of sentences in the

experiment constant between 2a and 2b/2c. Indeed, the proportion of target and fillers is another

factor that may come into play, and in this version of the experiment, it was higher in 2a than in 2b

and 2c.

4. Displayed cards depending on the condition (Experiment 1a, 1b and dual-task)

Mary: “X cards are [hearts]”.

TARGET CONDITIONS CONTROL CONDITIONS

4 condi L LP ∅ LPS

KN condi KN IGNO (i) KN (ii) IGNO KN IGNO

8 condi - The sentence

is true

The sentence

is ambiguous -

The sentence

is true

The sentence

is false - -

Player Mary Peter Mary Peter Mary Peter Mary Peter Mary Peter Mary Peter Mary Peter Mary Peter

SOME ♥♥♥♥

♥♥♥♥

♤♤♤♤

♥♥♥♥

♤♤♤♤

♥♥♥♥

♤♤♤♤

♥♥♥♥

♤♤♤♤

♥♥♥♥

♥♥♥♤

???♤

♥♥♥♤

PLURAL ♥♤♤♤

♤♤♤♤

♥♤♤♤

♥♥♥♤

♥♤♤♤

♤♤♤♤

♥♥♥♥

♤♤♤♤

♥♥♥♥

♤♤♤♤

♥♥♥♥

♥♥♥♤

???♤

♥♥♥♤

TWO ♥♥♥♥

♤♤♤♤

♥♥♥♥

♥♥♤♤

♤♤♤♤

♥♥♤♤

♤♤♤♤

♥♤♤♤

♥♥♤♤

♤♤♤♤

♥♥♤♤

???? - -

ALMOST

♥♥♥♥

♥♥♥?

♥♥♥♥

♥♥♥♤

♥♥♥♥

♥♥♥?

♥♥♥♥

♤♤♤♤

♤???

♤♥♥♥

♥♥♥♥

♤???

♤♤♤♤

♥♥♥♥

♥♥♥♤

♥♥♥♥

???? - -

ALMOST

♤♤♤♤

♤♤♤?

♤♤♤♤

♤♤♤♥

♤♤♤♤

♤♤♤?

♤♤♤♤

♥♥♥♥

♥???

♥♤♤♤

♤♤♤♤

♥???

♥♥♥♥

♥♤♤♤

♤♤♤♤

♥♤♤♤

???? - -

ALL - - - - - - ♥♥♥♥

♤♤♤♤

♥♥♥♥

♤♤♤♤

♥♥♥♥

???? - -

NO - - - - - - ♥♥♥♥

♤♤♤♤

♥♥♥♥

♤♤♤♤

♥♥♥♥

♤♤♤♤

???? - -

FIRST - - - - - - ♤♤♥♥

♤♤♥♥

??♥♥

??♥♥ - -

??♥♥

♤♤♥♥

♥♥♤♤

LAST - - - - - - ♥♥♤♤

♥♥♤♤

♥♥??

♥♥?? - -

♥♥??

♥♥♤♤

♤♤♥♥

KN: the speaker is knowledgeable.

IGNO: the speaker is ignorant.

(i) “Mary cannot say that” because the sentence is false and he knows it.

(ii) “Mary cannot say that” because he does not have enough information to say that.

Remark: LP always corresponds to L with some cards put face down.

5. Number of trials by condition (Experiments 1a, 1b and dual-task)

TARGET CONDITIONS CONTROL CONDITIONS

Examples Trials Total

Condition L LP O LPS

Speaker’s knowledge

conidtion KN IGNO IGNO KN IGNO IGNO KN IGNO

Truth value of the sentence

(card of the other player) - true false - true true/false - -

SOME 8 4 4 8 4 4 16 -

18 288 306

PLURAL 8 4 4 8 4 4 16 -

TWO 8 4 4 8 4 4 16 -

ALMOST ALL 8 4 4 8 4 4 16 -

ALMOST NO 8 4 4 8 4 4 16 -

ALL - - - 4 4 4 12 -

NO - - - 4 4 4 12 -

SOME 8 4 4 8 4 4 8 8

18 320 338

PLURAL 8 4 4 8 4 4 8 8

TWO 8 4 4 8 4 4 16 -

ALMOST ALL 8 4 4 8 4 4 16 -

ALMOST NO 8 4 4 8 4 4 16 -

ALL - - - 4 4 4 12 -

NO - - - 4 4 4 12 -

FIRST - - - 4 - 4 4 4

LAST - - - 4 - 4 4 4

SOME 8 4 4 4 2 2 4 4

4+4 184 192

PLURAL 8 4 4 4 2 2 4 4

TWO 8 4 4 4 2 2 8 -

ALMOST ALL 8 4 4 4 2 2 8 -

ALMOST NO 8 4 4 4 2 2 8 -

ALL - - - 2 2 2 6 -

NO - - - 2 2 2 6 -

* to be divided by 2, according to the two Cognitive Load conditions

6. Instructions (experiment 1a and 1b)

Peter and Mary are playing a card game.

At each round, eight cards are put on the table. Some cards can been seen by both Peter and Mary,

and some other cards can be seen only by Peter (or only by Mary).

After they have looked at the cards, Peter (or Mary) makes a statement about the cards.

Your task is to indicate if Peter (or Mary) could say what he (or she) said, on the basis of his (or her)

information.

Here are some examples:

Example 1: Mary: "All of the cards are spades."

Can Mary say that?

NO, because even if in fact the sentence is true, she does not have enough information to say that.

Example 2: Peter: "All of the cards are hearts."

Can Peter say that?

NO, because this is false (and he has enough information to know that it's false).

Example 3: Mary: "All of the cards are clubs."

Can Mary say that?

YES, because she can be sure the sentence is true.

Training with feedback:

Five non ambiguous sentences were used, listed above. The order of the sentences was randomized.

- “All of the cards are [hearts].” (3)

- “The first* card is a [heart].” (3)

- “Fewer than five cards are [hearts].” (3)

- “There is the same number of [hearts] and [spades].” (3)

- “The last* card is a [heart].” (2)

7 were attributed to Mary, 7 to Peter. 5 expected answer: “yes” ; 5 expected answer: “no” because

not enough information; 3 expected answer: “no” because false.

*: In experiment 1b, as we used sentences with FIRST and LAST, we changed these examples.

7. Instructions (dual task)

Instructions (1/2) (Truth value judgment task)

This part of the instructions was the same than in experiment 1.

Training with feedback (Truth value judgment task)

Four non ambiguous sentences were used, listed above. The order was randomized.

- “All of the cards are hearts.”

- “The second card is a club.”

- “Fewer than five cards are hearts.”

- “There is the same number of spades and diamonds.”

Instructions (2/2) (Letter Memory Task)

That's not all:

Before each of these questions, you will be shown random letters.

Remember them: after you have seen the cards and given your answer, you will be asked to

reproduce the same sequence of letters in reverse order.

For example, you may see ABCD, then you will answer a question about a round of cards, and then

you will be asked to reproduce the sequence of letters in reverse order, here: DCBA.

Please give your answer IN CAPITAL LETTERS, without leaving any space between the letters.

It is very important that you memorize correctly these letters: stay focused!

Training with feedback (Letter Memory Task)

The four same sentences were used.

The number of letters presented in the training sequence depended on the order of the blocks for

the participant (LOW CL - HIGH CL vs HIGH CL – LOW CL).

8. Comparison between experiments

The figure below illustrates the proportion of the three readings across experiments, for each scalar

LIT corresponds to the rate of answer “yes” in condition L.

PSI corresponds to the subtraction of the rate of answers “yes” in condition L

to the rate of answers “yes” in condition LP.

SSI corresponds to the rate of answer “no” in condition LP.

Our results have been replicated across four experiments: we detect three distinct readings for

SOME and ALMOST, and only two for NUMERALS and PLURAL. There are small variations in the rates

of derivation of the different readings across experiment, but the overall pattern is always the same.

Factors accounting for the variability are diverse: subjects, other sentences tested in the experiment,

relative proportion of fillers and targets, etc.

References

Baddeley, A. (1992). Working memory. Science, 255(5044), 556-559.

Bale, A., & Barner, D. (2013). Grammatical alternatives and pragmatic development. Alternatives in Semantics,‘Studies in

Pragmatics, Language and Cognition’. Palgrave Macmillan. New York, 238-66.

Barner, D. & Bachrach, A. (2010). Inference and exact numerical representation in early language development. Cognitive

Psychology, 60, 40–62.

Barner, D., Brooks, N., & Bale, A. (2011). Accessing the unsaid: The role of scalar alternatives in children's pragmatic

inference. Cognition, 188, 87-96.

Bergen, L., & Grodner, D. J. (2012). Speaker knowledge influences the comprehension of pragmatic inferences. Journal of

Experimental Psychology: Learning, Memory, and Cognition, 38, 1450-1460.

Bott, L., & Noveck, I. A. (2004). Some utterances are underinformative: The onset and time course of scalar

inferences. Journal of memory and language,51(3), 437-457.

Bott, L., Bailey, T. M., & Grodner, D. (2012). Distinguishing speed from accuracy in scalar implicatures. Journal of Memory

and Language, 66(1), 123-142.

Breheny, R., Ferguson, H. J., & Katsos, N. (2013). Taking the epistemic step: toward a model of on-line access to

conversational implicatures. Cognition,126(3), 423-440.

Breheny, R., Katsos, N., & Williams, J. (2006). Are generalised scalar implicatures generated by default? An on-line

investigation into the role of context in generating pragmatic inferences. Cognition, 100(3), 434-463.

Chemla, E., & Singh, R. (2014). Remarks on the experimental turn in the study of scalar implicature, Part I. Language and

Linguistics Compass, 8(9), 373-386.

Chemla, E., & Spector, B. (2011). Experimental evidence for embedded scalar implicatures. Journal of semantics, ffq023.

Chierchia, G. (2004). Scalar implicatures, polarity phenomena, and the syntax/pragmatics interface. Structures and

beyond, 3, 39-103.

Chierchia, G. (2006). Broaden your views: Implicatures of domain widening and the “logicality” of language. Linguistic

inquiry, 37(4), 535-590.

Chierchia, G., Crain, S., Guasti, Maria T., Gualmini, A., & Meroni, L. (2001). The Acquisition of Disjunction: Evidence for a

Grammatical View of Scalar Implicatures. Proceedings of the 25th Boston University Conference on Language Development.

Somerville: Cascadilla Press, 157-168.

Chierchia, G., Fox, D., & Spector, B. (2008). The grammatical view of scalar implicatures and the relationship between

semantics and pragmatics.Unpublished manuscript.

Chierchia, G., Fox, D., & Spector, B. (2009). Hurford’s constraint and the theory of scalar implicatures. Presuppositions and

implicatures, 60, 47-62.

De Neys, W. (2006). Dual processing in reasoning two systems but one reasoner. psychological Science, 17(5), 428-433.

De Neys, W., & Schaeken, W. (2007). When people are more logical under cognitive load. Experimental Psychology

(formerly Zeitschrift für Experimentelle Psychologie), 54(2), 128-133.

Degen, J., & Tanenhaus, M. K. (2011). Making inferences: the case of scalar implicature processing. In Proceedings of the

33rd annual conference of the cognitive science society (pp. 3299-3304). Austin, TX: Cognitive Science Society.

Dieussaert, K., Verkerk, S., Gillard, E., & Schaeken, W. (2011). Some effort for some: further evidence that scalar

implicatures are effortful. The Quarterly Journal of Experimental Psychology, 64(12), 2352-2367.

Engle, R. W. (2002). Working memory capacity as executive attention. Current directions in psychological science, 11(1), 19-

Form, L. (1995). Plurality, conjunction and events.

Fox, D. (2007). Free choice and the theory of scalar implicatures. In Uli Sauerland and Penka Stateva (Eds.), Presupposition

and implicature in compositional semantics. Houndmills, Basingstoke: Palgrave Macmillan.

Fox, D. (2014). Cancelling the Maxim of Quantity: Another challenge for a Gricean theory of scalar implicatures. Semantics

and Pragmatics, 7, 5-1.

Gazdar, G. (1979). Pragmatics: Implicature, presupposition, and logical form. New York: Academic Press.

Geurts, B. (2009). Scalar implicature and local pragmatics. Mind & Language,24(1), 51-79.

Grice, P. (1975). Logic and conversation. In P. Cole & J. Morgan (Eds.), Syntax and Semantics, Volume 3. New York: Academic

Grodner, D. J., Klein, N. M., Carbary, K. M., & Tanenhaus, M. K. (2010). “Some,” and possibly all, scalar inferences are not

delayed: Evidence for immediate pragmatic enrichment. Cognition, 116(1), 42-55.

Hirschberg, J. L. B. (1985). A theory of scalar implicature. University of Pennsylvania.

Hitzeman, J. (1992). The selectional properties and entailments of almost. InPapers from the 28 th Regional Meeting of the

Chicago Linguistic Society (pp. 225-238).

Hochstein, L., Bale, A., Fox, D., & Barner, D. (2014). Ignorance and Inference: Do Problems with Gricean Epistemic Reasoning

Explain Children’s Difficulty with Scalar Implicature?. Journal of Semantics, ffu015.

Horn, L. (1972). On the Semantic Properties of Logical Operators in English. Ph.D. dissertation. University of California. Los

Angeles, CA.

Horn, L. R. (2011). Almost forever. Pragmatics and autolexical grammar: In honor of Jerry Sadock, 3-21.

Huang, Y. T., & Snedeker, J. (2009). Online interpretation of scalar quantifiers: Insight into the semantics–pragmatics

interface. Cognitive psychology, 58(3), 376-415.

Huang, Y. T., & Snedeker, J. (2009). Semantic meaning and pragmatic interpretation in 5-year-olds: evidence from real-time

spoken language comprehension. Developmental psychology, 45(6), 1723.

Kilbourn-Ceron, O. (2015). Embedded exhaustification: evidence from almost.

Landman, F. (1998). Plurals and maximalization (pp. 237-271). Springer Netherlands.

Levinson, S. C. (2000). Presumptive meanings: The theory of generalized conversational implicature. Cambridge, MA: MIT

press.

Marty, P., & Chemla, E. (2013). Scalar implicatures: working memory and a comparison with only. Frontiers in psychology, 4.

Marty, P., Chemla, E., & Spector, B. (2013). Interpreting numerals and scalar items under memory load. Lingua, 133, 152-

Miyake, A., & Shah, P. (Eds.). (1999). Models of working memory: Mechanisms of active maintenance and executive control.

Cambridge University Press.

Noveck, I. (2001), When children are more logical than adults: Experimental investigations of scalar implicature. Cognition,

78, 165-188.

Noveck, I. A., & Posada, A. (2003). Characterizing the time course of an implicature: An evoked potentials study. Brain and

language, 85(2), 203-210.

Papafragou, A. & Musolino, J. (2003). Scalar implicatures: experiments at the semantics- pragmatics interface. Cognition,

86: 253-282.

Pearson, H. A., Khan, M., & Snedeker, J. (2011, March). Even more evidence for the emptiness of plurality: An experimental

investigation of plural interpretation as a species of implicature. In Proceedings of SALT (Vol. 20, pp. 489-507).

R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing,

Vienna, Austria. URL http://www.R-project.org/.

Rips, L. J. (1975). Inductive judgments about natural categories. Journal of verbal learning and verbal behavior, 14(6), 665-

Russell, B. (2006). Against grammatical computation of scalar implicatures.Journal of semantics, 23(4), 361-382.

Sadock, J. (1981). Almost. Radical pragmatics, 257-271.

Sauerland, U. (2004). Scalar Implicatures in Complex Sentences. Linguistics and Philosophy, 27, 367-391

Sauerland, U. (2012). The computation of scalar implicatures: pragmatic, lexical or grammatical?. Language and Linguistics

Compass, 6(1), 36-49.

Sauerland, U., Anderssen, J., & Yatsushiro, K. (2005). The plural is semantically unmarked. Linguistic evidence, 413-434.

Smith, C. L. (1980). Quantifiers and question answering in young children.Journal of Experimental Child Psychology, 30(2),

191-205.

Spector, B. (2003). Scalar Implicatures: Exhaustivity and Gricean Reasoning. Proceedings of the ESSLLI, 3, 277-288.

Spector, B. (2007). Aspects of the pragmatics of plural morphology: On higher-order implicatures. Presuppositions and

implicatures in compositional semantics, 243-281.

Spector, B. (2013). Bare numerals and scalar implicatures. Language and Linguistics Compass, 7(5), 273-294.

Tieu, L., Bill, C., Romoli, J., & Crain, S. (2014, September). Plurality inferences are scalar implicatures: Evidence from

acquisition. In Semantics and Linguistic Theory (Vol. 24, pp. 122-136).

Tomlinson, J., Bott, L., & Bailey, T. (2011). Understanding literal meanings before pragmatic inference: Mouse-trajectories of

scalar implicatures. In th Biennial Conference of Experimental Pragmatics, June (pp. 2-4).

Van Rooij, R., & Schulz, K. (2004). Exhaustive interpretation of complex sentences. Journal of logic, language and

information, 13(4), 491-519.

Van Tiel, B., Van Miltenburg, E., Zevakhina, N., & Geurts, B. (2014). Scalar diversity. Journal of Semantics, ffu017.

Zweig, E. (2008). Dependent plurals and plural meaning. ProQuest.

Experimental study of Primary and Secondary Scalar...

Documents

Transcript of Experimental study of Primary and Secondary Scalar...

PRIMARY ELECTIONS DEMOCRATIC PRIMARY ELECTION

EXPLORATION DE LA FOCALISATION SUR SOI …sapience.dec.ens.fr/cogmaster/www/doc/MEMOIRES/2012_BRIEND... · Master en Sciences Cognitives, spécialité Neurosciences Cognitives EHESS/ENS/Université

demie-journée neuropsy cogmaster 2010 handout · Patient 1 Patient 2 Patient 3. 28 MASTER RECHERCHE EN sciences cognitives Patient 1 Patient 2 ... vert: Connectivitéaugmentée pour

PRIMARY LOGO - PRIMARY LOCKUP - College of …marcomm.cofc.edu/documents/logo_manual.pdf · PRIMARY LOGO - PRIMARY LOCKUP The primary lockup utilizes the wordmark, centered, top to

Cogmaster 2011_Ep0bis

Primary 1 Primary 2 Primary 3 Primary 6 Primary 7 · Primary 3 Primary 4 Primary 5 Primary 6 Primary 7 This map is reproduced from Ordnance Survey material with the permission of

Wales Street International Baccalaureate Primary …...Wales Street Primary School International Baccalaureate Primary Years Programme What is the Primary Years Programme? The Primary

Cogmaster 2011_Ep1

Primary Treasure Chest Teaching Resources · Title: Primary Treasure Chest Teaching Resources Author: Primary Treasure Chest Subject: Primary Treasure Chest Keywords: Primary Treasure

KILKENNY PRIMARY SCHOOL PRIMARY SCHOOL

Primary Care Physicians-Distribution and Population to ......Wisconsin’AHEC’Health’Workforce’Data’Brief! primary!care! physicians! primary!care ! ! ! ! ! ! !!!!! Primary!care!physicians!!

Education Act13 (Pre-Primary,PrimaryandPost-Primary) Act 2008 · Act13 (Pre-Primary,PrimaryandPost-Primary)Act 2008 THEEDUCATION(PRE-PRIMARY,PRIMARYAND POST-PRIMARY)ACT,2008 _____

Strengthening Primary Health Care through Primary …neltoolkit.rnao.ca/sites/default/files/Strengthening Primary Health... · Strengthening Primary Health Care through Primary Care

Effects of high-frequency motor cortex repetitive ...sapience.dec.ens.fr/cogmaster/www/doc/MEMOIRES/... · become a non-invasive alternative to motor cortex epidural stimulation ...

Estimation of primary production in ecosystem,Primary Production,Gross primary production (GPP)—,Net primary production (NPP):

Primary & Advanced Primary Wastewater Treatement

Nottingham Schools Trust Hill Primary Greenfields Community School Haydn Primary Heathfield Primary (split site) Hempshill Hall Primary Henry Whipple Primary Melbury Primary

CYBER WELLNESS TIPS - Henry Park Primary School · Cyber Wellness Curriculum Outline Term Primary 1 Primary 2 Primary 3 Primary 4 Primary 5 Primary 6 1 Respecting ICT equipment Respecting

IMPULSIVITE COGNITIVE ET DECISIONNELLE DANS LE SYNDROME DE ...sapience.dec.ens.fr/cogmaster/www/doc/MEMOIRES/2016_EHRMINGER.pdf · Impulsivité cognitive et décisionnelle dans le

Primary Pattern Profile Primary Pattern Plan Primary Air Pattern.