Post on 30-Mar-2018
1
CogMaster – ENS / EHESS / Université Paris Descartes
Laboratoire de Sciences Cognitives et Psycholinguistique
Institut Jean Nicod
Experimental study of Primary and Secondary
Scalar Implicatures
Anouk Dieuleveut
under the supervision of
Emmanuel Chemla and Benjamin Spector
June 6, 2015
2
Contents
Acknowledgments ..............................................................................................................................4
Originality statement ..........................................................................................................................5
Contribution statement ......................................................................................................................6
Abstract ..............................................................................................................................................7
1. Introduction: theoretical background on Scalar Implicatures .......................................................8
1.1. The two main accounts of Scalar Implicatures .....................................................................8
1.1.1. The Gricean account: distinguishing three levels of reading .........................................8
1.1.2. The Grammatical account .......................................................................................... 10
1.1.3. Comparison of the two accounts concerning the distinction between PSI and SSI ...... 11
1.2. The experimental study of SI as a way to inform theoretical debates ................................. 12
1.2.1. Previous experiments and methodological considerations ......................................... 12
1.2.2. Goal 1: a more fine-grained approach of SI ................................................................ 13
1.2.3. Goal 2: studying the processing patterns of PSI and SSI .............................................. 14
1.2.4. Related studies .......................................................................................................... 15
1.3. “Scalar diversity” ............................................................................................................... 17
1.3.1. Not just testing SOME: why study a broader range of implicatures? ........................... 17
1.3.2. Informing theoretical debates on three expressions ................................................... 17
1.3.3. Summing up ............................................................................................................... 19
2. Work done during the internship .............................................................................................. 20
3. Experiment 1a: existence of primary and secondary implicatures for different scalar items....... 21
3.1. Goal................................................................................................................................... 21
3.2. Method and materials ....................................................................................................... 21
Experimental items ................................................................................................................... 22
Procedure ................................................................................................................................. 25
3.3. Participants ....................................................................................................................... 26
3.4. Results ............................................................................................................................... 26
Analysis of responses ................................................................................................................ 26
Readings by subjects ................................................................................................................. 28
Response times (description) .................................................................................................... 29
4. Experiment 1b: control experiment ........................................................................................... 30
4.1. Goal................................................................................................................................... 30
4.2. Method and materials ....................................................................................................... 30
3
4.3. Participants ....................................................................................................................... 30
4.4. Results ............................................................................................................................... 31
Analysis of responses ................................................................................................................ 31
Readings by subjects ................................................................................................................. 33
Response times ......................................................................................................................... 33
4.5. Conclusion ......................................................................................................................... 33
Discussion for Experiment 1a and 1b ............................................................................................. 34
5. Experiment 2: dual-task impact on primary and secondary implicatures.................................... 37
5.1. Goal................................................................................................................................... 37
5.2. Method and materials ....................................................................................................... 38
Truth Value Judgment Task ....................................................................................................... 38
Letter Memory Task .................................................................................................................. 39
Procedure ................................................................................................................................. 40
5.3. Participants ....................................................................................................................... 41
5.4. Results ............................................................................................................................... 42
Letter Memory Task .................................................................................................................. 42
Truth value judgment task......................................................................................................... 43
5.5. Discussion ......................................................................................................................... 46
6. General discussion .................................................................................................................... 48
Appendices ....................................................................................................................................... 50
1. Abbreviations ........................................................................................................................ 50
2. Pilot Experiment .................................................................................................................... 51
3. Experiment 2: testing the influence of presenting the stronger alternative ............................ 53
4. Displayed cards depending on the condition (Experiment 1a, 1b and dual-task) .................... 56
5. Number of trials by condition (Experiments 1a, 1b and dual-task) ......................................... 57
6. Instructions (experiment 1a and 1b) ...................................................................................... 58
7. Instructions (dual task) .......................................................................................................... 59
8. Comparison between experiments ........................................................................................ 60
References ........................................................................................................................................ 61
4
Acknowledgments
First, I want to thank Benjamin Spector and Emmanuel Chemla. They were great supervisors and
advisors. It is a chance to work with such people.
I also really thank Alexandre Cremers for helping me with IBEX and statistical analysis, and Florian
Pellet for helping me with IBEX and launching the experiments on Mechanical Turk.
I also thank the members of the ghost Writing Group, with a special mention to Mora and Adriana.
Last, I thank Juliette for her relevant questions and remarks during the first semester, Dominique for
the corrections, Floriane and Iryna for their patience and constant good mood, and Aymeric for his
general support.
5
Originality statement
This work is original for three main reasons:
First, we implement a paradigm that enables to experimentally distinguish between three levels of
reading for standard scalar items (literal meaning, primary scalar implicature and secondary scalar
implicature), which had, to our knowledge, not been done until now: most experimental studies of SI
only oppose two levels of readings (literal meaning and scalar implicature). Establishing the existence
of primary and secondary scalar implicatures is a way to draw a link between linguistic theories and
experimental data, and to inform the current theoretical debate between the Gricean and the
Grammatical account of SI. In particular, our experimental design enables us to test the Gricean
account on a specific point of the theory: the role of the Competence Assumption in the “Epistemic
Step”.
Second, we use this paradigm to study three debated cases of Scalar Implicature: numeral
quantifiers, the plural morpheme and the modifier almost. We compare their behavior to the
standard case of SOME regarding the distinction between primary and secondary scalar implicatures.
There was until now nearly no experimental data concerning the modifier almost, and it had even
not been shown that the literal reading existed. Our results strongly support the SI account.
Numerals quantifiers have already been much studied: our results add to previous findings, revealing
a new kind of differences between numerals and standard scalar items. Last, the plural morpheme
has not been much investigated in the experimental literature: our result suggest that, as for
numerals, they differ from standard scalar items, at least regarding the distinction between primary
and secondary scalar implicatures.
Last, we study the effect of implementing a dual-task on primary and secondary implicatures.
6
Contribution statement
This internship was jointly supervised by Benjamin Spector and Emmanuel Chemla.
Here are the main contributors:
Choice and definition of the scientific issue: B. Spector, E. Chemla, A. Dieuleveut
Bibliographical review: A. Dieuleveut, B. Spector, E. Chemla
Development of the methodology:
- Pilot: idea: B. Spector, E. Chemla
Implementation on IBEX: A. Dieuleveut, A. Cremers (SubHtml)
- Improvements of the pilot, experiments 1a and 1b: A. Dieuleveut, E. Chemla, B. Spector
Implementation on IBEX : A. Dieuleveut
- Dual task experiment: idea: B. Spector, E. Chemla, based on a study of P. Marty
Implementation on IBEX: A. Dieuleveut, F. Pellet (feedback), A. Cremers (randomization)
- Testing participants (launching the experiments via Mechanical Turk): F. Pellet, A. Cremers
Data analysis: A. Dieuleveut, E. Chemla, A. Cremers
Interpretation of the results: A. Dieuleveut, B. Spector, E. Chemla
Writing of the thesis, tables and figures: A. Dieuleveut
Corrections of the thesis:
- orthography: D. Juffin
- content: B. Spector, E. Chemla
7
Abstract
When you hear a sentence such as “Some of the cards are hearts”, you tend to understand that “it is
not the case that all cards are hearts”, even if the sentence with SOME is logically true if all cards are
hearts. This kind of linguistic inferences is called a scalar implicature. To date, several accounts of this
phenomenon have been proposed, mainly concerned with the question of the nature of the
mechanism at stake.
This experimental work addresses the current theoretical debate between the Gricean and the
Grammatical accounts of scalar implicatures by studying the distinction between primary scalar
implicature and secondary scalar implicature. As proposed by Sauerland (2004), we can theoretically
distinguish three levels of understandings for a sentence such as “Some of the cards are hearts”: (i)
“the speaker believes that SOME – and possibly ALL - cards are heart ” (literal reading), (ii) “it is not
the case that the speaker believes that ALL cards are hearts” (primary reading) and (iii) “the speaker
believes that it is not the case that ALL cards are hearts” (secondary reading). According to the
Gricean account, these three readings are accessed incrementally, with the final step from primary to
secondary implicature (known as the “Epistemic Step”) depending on the hypothesis of the speaker
being “opinionated”, called the “Competence Assumption”.
Based on a paradigm manipulating the informational state of a fictional speaker, we show with three
web experiments that for the paradigmatic example of SOME, these three levels of readings can be
distinguished. Furthermore, we show that the computation of the secondary reading does not
depend solely on the Competence Assumption: it can be accessed even when the speaker is not fully
informed.
Three other debated cases of SI are tested and compared to SOME using the same paradigm:
ALMOST, NUMERALS and PLURAL. We show that the behavior of ALMOST is highly similar to SOME,
supporting an SI account of ALMOST. Regarding PLURAL and NUMERALS, only two levels of reading
could be distinguished, namely, the literal reading and the secondary reading, suggesting that the
mechanism underlying the computation may differ between these items and standard scalar items.
In order to test the processing properties of these three levels of reading, we further implemented a
dual-task version of the experiment, as a way to better inform the debate between the Gricean and
the Grammatical account, both being able to predict the existence of three distinct readings. This
part is not conclusive yet, but deserves deeper investigation.
8
1. Introduction: theoretical background on Scalar Implicatures
1.1. The two main accounts of Scalar Implicatures
1.1.1. The Gricean account: distinguishing three levels of reading
When you hear a sentence such as “Some of the cards are hearts”, you tend to understand it as
meaning that “Some but not all of the cards are hearts”, even if, logically speaking, the sentence is
true when all cards are hearts. This kind of linguistic inference is called a Scalar Implicature (which
we will abbreviate SI from now on1). SI are a particular case of conversational implicatures, a concept
introduced by Grice (1975): these are inferences that, instead of directly coming from the linguistic
meaning of a sentence, result from a pragmatic reasoning taking into account the communicative
intentions of the speaker.
The debate about SI has mostly centered on the question of how these inferences come about.
Today, the dominant view is the Gricean account (Grice, 1975; Horn, 1972; Gazdar, 1979; Spector,
2003; Sauerland, 2004; van Rooij & Schultz, 2004; Russell, 2006, among others). Crucially, the
computation of an SI involves the comparison of the sentence actually uttered with a minimally
different sentence, called its scalar alternative, that the speaker could have uttered in the same
situation.
More precisely, the Gricean account distinguishes the following steps in the computation of a SI:
When you hear a sentence (S) containing a scalar item, for example “Some of the cards are hearts”:
1. First, you compute the literal meaning of (S), written as follow:
[[S]]: “Some, and possibly all, cards are hearts. “
Applying the Gricean maxim of quality2, you understand that “the speaker believes that S is true”,
which we will write B (S). (B (X) stands for “the speaker believes that X”).
2. Then, you compare the uttered sentence (S) with an alternative to this sentence, (S’), i.e. a
sentence that the speaker could have chosen in such a situation. Alternatives are obtained by
1 The main abbreviations used in this work are presented in Appendix 1. 2 In his William James lectures, Grice proposed that conversation rested on a “principle of cooperation”, that could be characterized by several maxims of conversation, which the speakers are supposed to follow. Two of the four Gricean conversational maxims are relevant to our purpose here:
- Maxim of quality: say only what you know to be true.
- Maxim of quantity: be as informative as needed.
9
replacing, in the sentence, the scalar term by other expressions belonging to the same scale
– in our example, <SOME, ALL>.
(S’): “All of the cards are hearts”.
The adjective “scalar” comes from the role of these scales in the derivation. Importantly, the
members of a scale can be ordered according to their logical strength: a hearer can thus compare (S)
and its alternative (S’) in term of informativity. In our example, (S’) asymmetrically entails (S):
uttering (S’) would be more informative than (S).
3. Assuming that the speaker is cooperative and gives as much information as she can
(following the Gricean maxim of quantity), you can infer that it is not the case that she
believes that (S’) is true. Indeed, if she had held this belief, she would have used (S’) instead
of (S).
This strengthened meaning is called the Primary Scalar Implicature (PSI from now).
¬B (S'): “it is not the case that the speaker believes that all of the cards are hearts.”
4. According to most recent Neo-Gricean accounts (Sauerland, 2004; Spector, 2003, a.o.), you
can go further in the computation: assuming that the speaker is well-informed, you can infer
that the speaker believes (S’) to be false.
This strengthened meaning is called the Secondary Scalar Implicature (SSI).
B( ¬ S’) : “the speaker believes that it is not the case that all of the cards are hearts.”
The step from PSI to SSI is called the Epistemic Step (Sauerland, 2004)3. According to a strict version
of the account, it relies on the assumption that the speaker is well-informed (or “opinionated”),
called the Competence Assumption: the speaker is knowledgeable regarding the truth value of (S’)
(Geurts, 2009).
Formally, the Competence Assumption corresponds to B(S’) ∨ B(¬ S’).
Table 1 summarizes the three levels of reading distinguished in the standard Neo-Gricean account.
LIT Literal meaning B(S) The speaker believes that (S) is true.
SI
PSI Primary Scalar Implicature B(S) ¬B(S’) The speaker believes that (S) is true and it is not
the case that the speaker believes that (S') is true.
SSI Secondary Scalar Implicature B(S) B(¬S’) The speaker believes that (S) is true and the
speaker believes that (S') is not true.
3 The distinction between PSI and SSI was first introduced by Sauerland (2004) in order to account for the case of
disjunction (scale <OR, AND>), that we will not explain in detail here.
10
Table1. Three levels of reading for scalar items.
Example: (S) = “Some of the cards are hearts.”
(S’) = “All of the cards are hearts.”
(S’) asymmetrically entails (S).
We have until now given the example of the scale <SOME, ALL>, but this reasoning can be applied to
many other scales, called Horn scales (Horn, 1972): other quantifiers (ex. <FEW, NONE>), connectives
(ex. <OR, AND>), numeral quantifiers (<ONE, TWO, THREE …>), verbs, adjectives, etc. The reasoning
can even be applied to contextual scales (Hirschberg, 1985), but we will not address this point here.
1.1.2. The Grammatical account
The Gricean account has recently been challenged by another approach, the Grammatical approach
(Chierchia 2004, 2006; Chierchia, Fox & Spector, 2008, 2009, 2012; Fox 2007; Landman, 1998).
According to this account, the mechanism that gives rise to SI is not pragmatic but grammatical in
nature. It relies on the application of a phonologically null grammatical operator, often written O
because its meaning is similar to the meaning of the word only. Crucially, O insertion is a syntactic
process.
Under this view, a sentence (S) as “Some of the cards are hearts” is structurally ambiguous between
two parses:
- Parse without the operator:
[[S]]: “Some, and possibly all, cards are hearts. “
One may further apply pragmatic mechanisms (maxim of quality) and obtain B ( [[S]] ), which
corresponds to the literal meaning under the Gricean approach.
- Parse with the operator o:
[[ o(S) ]]: “only some of the cards are hearts”.
This parse leads to B( [[o(S) ]]), which corresponds to the SSI:
B(o(S) ) = B(S ∧¬S') ( “only some” means that “some, but not all, cards are hearts”.)
The listener has to make a disambiguation choice between the two parses. The factors that play a
role in this choice are not clearly specified by the theory, but evidence that the speaker is
knowledgeable or opinionated is one of them (Fox, 2007).
Under this approach, the SSI is not derived from the PSI. The PSI involves a pragmatic reasoning
similar to the Gricean mechanism, and can be obtained after the disambiguation choice.
11
Other accounts have been proposed for SI (see Sauerland, 2012, for a recent summary), but we will
not present them in detail here. We will occasionally refer to the default theory, according to which
SI are generated automatically and then can be cancelled (Levinson, 2000). As this theory is not
supported by experimental results, we won’t develop it thoroughly here.
1.1.3. Comparison of the two accounts concerning the distinction between PSI and SSI
The two accounts presented so far agree on some facts, notably on the role of alternatives in the
derivation and on the existence of Horn scales, as well as certain linguistic properties of SI such as
cancelability (often considered as the hallmark of SI). The debate essentially bears on the division of
labor between semantics and pragmatics in the derivation of SI, more specifically on the nature of
the mechanism by which (S’) is negated. According to the Gricean account, this mechanism is
pragmatic in nature: it requires taking into account the intentions and beliefs of the speaker.
According to the Grammatical account, the mechanism is primarily grammatical, and you don’t
necessarily have to take into account the mental states of the speaker to compute a scalar
implicature.
In particular, these accounts differ on the status of the different readings we have distinguished, and
this is what we will focus on, as a way to help teasing apart the two theories4. The status of the three
readings is summed up in Figure 1.
(a) In the Neo-Gricean account of SI, the derivation of SI is incremental: you first compute LIT,
then you optionally derive PSI, and then you optionally derive SSI if the Competence
Assumption holds.
(b) In the grammatical account, PSI are derived at the end, after the decision of applying or not
the O operator. PSI are pragmatic and SSI are grammatical.
NB: Under the Gricean account, you can’t derive SSI if the speaker is not well-informed, because you
compute SI according to the mental state of the speaker.
Under the Grammatical account, it is possible to derive SSI as a possible reading of the sentence even
if the speaker is not a priori taken to be well-informed, although it would not be a preferred parse.
4 There are other important theoretical differences between the two accounts. In particular, under the Gricean view, implicatures are computed globally (at the level of the sentence) and depend on a general cognitive system, whereas under the Grammatical account, implicatures can be computed locally and depend on a specific cognitive system, grammar.
12
Figure 1: comparison between Gricean and Grammatical accounts for PSI and SSI.
(Simplified version of Chemla & Singh, 2013)
Neo-Gricean Account
Grammatical Account
Let (S) be a sentence containing a scalar item.
(1) Applying the maxim of quality, you obtain that the
speaker believes the literal meaning, B(S).
(2) Scalar alternatives (S’) are obtained by replacing the
scalar item by other members of the scale.
(3) Applying the maxim of quantity, we infer that
¬B(S’), otherwise the speaker would have uttered
(S’).
(4) Assuming that the speaker is well-informed
(Competence Assumption), we can strengthen the
meaning into B( ¬S’).
Let (S) be a sentence containing a scalar item.
(1) We have the choice to apply or not a
phonologically null operator, written O, which
has a meaning similar to only.
(1a) Parse without the operator: B(S)
(1b) Parse with the operator: B(o(S)) = B(S’)
(2) We can further obtain PSI by Gricean
reasoning.
1.2. The experimental study of SI as a way to inform theoretical debates
1.2.1. Previous experiments and methodological considerations
Experimental studies of SI have developed quite a lot in the last decade, and have proven to be a
useful tool to inform theoretical debates. Diverse methodologies have been used, both in adults –
with truth value judgment tasks, response-time studies (Bott & Noveck, 2004 ; Noveck & Posada,
13
2003), self-paced reading (Breheny, Katsos & Williams, 2006; Bergen & Grodner, 2012), mouse-
tracking (Tomlinson, Bott & Bailey, 2011), eye-tracking (Huang & Snedeker, 2009, Grodner, Klein,
Carbary & Tanenhaus, 2010), dual-task paradigms (De Neys & Schaeken, 2007, Dieussaert, Verkerk,
Gillard & Schaeken, 2011, Marty & Chemla, 2013; Marty, Chemla & Spector, 2013) -, and in children
(Smith, 1980; Noveck, 2001; Papafragou & Musolino, 2003, among others).
We will only present the ones that are relevant to our purpose here.
One of the first experiments assessing SI was conducted by Bott & Noveck (2004). Based on previous
paradigms implemented by Rips (1975) and Smith (1980), it consisted in a sentence-verification task:
participants were presented with under-informative sentences such as “Some elephants are
mammals” and had to indicate whether the sentence was true or false. Crucially, this kind of
sentence is true if you compute the literal meaning, and false if you compute the implicature. This
was a way to establish the existence of two distinct levels of reading.
1.2.2. Goal 1: a more fine-grained approach of SI
Many experimental studies of SI are based on a Truth Value Judgment Task paradigm similar to Bott
& Noveck, which enables to make a distinction between the literal meaning (LIT) and the implicated
meaning (SI). They contemplate SI as a whole, without distinguishing PSI and SSI.
Our main goal was to show experimentally that, following the theoretical literature, three levels of
reading (LIT, PSI and SSI) could be distinguished for standard scalar items. Following the paradigm of
Bott & Noveck, this meant finding a case where PSI and SSI gave rise to different answers, i.e. a case
in which the PSI was true, but not the SSI.
One way to achieve this is to manipulate the level of information of the speaker: in a context where
the speaker is ignorant about the truth value of the alternative, a sentence like “Some cards are
hearts” will be judged appropriate with a PSI reading, whereas with a SSI reading, it will be judged
inappropriate, because of the speaker not having enough information to say that.
We will come back in greater detail to the paradigm we implemented in Chapter III.
Distinguishing the SSI from the PSI is a way to test the Gricean account of SI, which clearly postulates
that the three readings exist and are derived incrementally. Our paradigm further enables to test the
role of the alleged Competence Assumption in the “Epistemic Step”. The underlying reasoning is close
to a reasoning proposed by Fox (2014) to test the involvement of the maxim of quantity in the
Gricean account. In a thought experiment, he considers a situation (a TV game show) where the
14
speaker is uncooperative and does not follow the maxim of quantity, i.e. does not give all the
information: according to a strict Gricean account, there should be no more SI in such a context, as
their access crucially relies on the maxim of quantity. The rationale underlying this paradigm is
interesting for us: blocking the Competence Assumption and seeing whether SSI still arise is a way to
test its role. If the strict version of the Neo-Gricean schema is correct, there should be no SSI when
the speaker is not well-informed.
1.2.3. Goal 2: studying the processing patterns of PSI and SSI
Importantly, both the Gricean and the Grammatical accounts can predict the bare existence of the
three readings: they really differ on the question of the relations between them (Chemla & Singh,
2013). The Gricean account clearly states that the SSI is derived from the PSI, in an incremental way,
whereas in the Grammatical account, there is an ambiguity between LIT and SSI, the PSI being
derived afterwards, with different mechanisms. Do empirical data support the latter or the former
view? This was the second question we wanted to answer.
It has been shown using diverse methodologies that there was a cost associated to SI computation. In
particular, the study of response times in classical truth value judgment tasks (Bott & Noveck, 2004,
De Neys & Schaeken, 2007; Posada & Noveck, 2003; Rips, 1975, a.o.) showed that in target
sentences, participants who judged the statements to be false were slower than those who judged
them to be true. In the same way, the rate of SI increased as a permitted response time did. That
suggests that SI is derived from LIT, in an incremental way, as in the Gricean account. This finding was
confirmed using other methodologies: dual-task paradigm, eye tracking studies, mouse tracking
studies: SI are derived with a delay (Bott, Bailey & Grodner, 2012).
As said before, these studies only distinguish SI and LIT. Our second goal was therefore to study the
processing properties of these different levels of reading, which was interesting in two respects: it
was first a way to inform the theoretical debate between the Gricean and the Grammatical account,
and it is also a way to have a better understanding of the exact nature of this cost observed for SI.
Comparing the processing patterns of PSI and SSI could be done using several methodologies. We
first aimed to study response times, but potentially because of technical reasons linked to the fact
that we ran online experiments, our results were too noisy. Hence, we turned to another paradigm: a
dual-task experiment, which we will present in detail in Chapter IV.
15
1.2.4. Related studies
To our knowledge, it has never been shown straightforwardly that three levels of reading existed: the
PSI and the SSI are almost always confounded in experimental studies. However, the distinction
between PSI and SSI is sometimes addressed in the experimental literature5, specifically in studies
that manipulate the level of information of the speaker.
In particular, using a self-paced reading paradigm, Bergen & Grodner (2012) indirectly address our
point. The aim of their study differs from ours: it is to show that the speaker’s knowledge influences
the SI computation. As they use the distinction between PSI and SSI (called, respectively, Weak SI and
Strong SI) and manipulate the level of information of the speaker, it is tightly linked to our work,
that’s why we shall briefly present their study.
They implement a self-paced paradigm based on the following reasoning. Each trial consists of three
sentences: a context sentence, a trigger sentence and a continuation sentence. The context sentence
enables to manipulate the level of information of the speaker (ex: Full-knowledge – “I meticulously
compiled the investment report”. vs Partial-knowledge – “I skimmed the investment report.”). The
Trigger Sentence is: “Some of the real estate investments lost money”. The Continuation Sentence is:
“The rest were successful despite the recent economic downturn. “
The dependent variable is the reading time of SOME and of THE REST. A longer reading time on
SOME indicates that you derive the implicature (it is harder), and a shorter reading time on THE REST
indicates that you have previously derived the strong scalar implicature (it is easier because you have
already accessed the referent, which is the “complement set”: in the example, “a set of investments
that did not lose money.”).
Testing the effect of the level of information of the speaker, this means that the reading times on
SOME should be longer (and, conversely, shorter on THE REST) in the Full-Knowledge condition than
in the Partial-Knowledge condition, indicating that you have derived the strong implicature. This is
indeed what they found.
5 Ignorance inferences: Although they are not our focus, another kind of inferences has also been discussed in the
literature: ignorance inference (also called uncertainty inferences) (Chemla & Singh, 2013; Hoschtein, Bale, Fox & Barner,
2014; Fox, 2014, among others). As they are closely related, we will briefly explain how they differ from our concern. These
inferences correspond to the reading “the speaker is ignorant about the truth value of the stronger alternative (S’)”
Importantly, Ignorance Inferences are not equivalent to PSI, even if they are occasionally collapsed in the literature:
formally, PSI correspond to B(S) ¬B(S’), whereas Ignorance Inferences correspond to B(S) ¬B(S’) ¬B(¬S’). Under the
Gricean account, Ignorance Inferences result from a strengthening mechanism from the PSI, when the Competence
Assumption cannot be made. However, it is perfectly possible that the meaning is not strengthened from the PSI.
16
This experiment is interesting in two respects. First, it shows that the knowledge state attributed to
the speaker influences the likelihood of deriving an implicature: more implicatures are derived in the
“full-knowledge” condition. Second, and more linked to our purpose, their paradigm may enable us
to establish the existence of SSI per se. Indeed, the decrease of reading time on THE REST is explained
by the fact that the participant already accessed the “complement set”, which may not the case if the
participant only derived the PSI but not the SSI. However, this can be questioned: deriving the PSI
may also get the participant closer to the belief that the “complement set” is not empty: the PSI
could also explain the difference found.
In any event, this paradigm does not It enable us to distinguish between three levels of readings. The
results do not show either that PSI exist: the difference in reading times can be explained opposing
LIT to SSI. It could be interesting to know whether there are cases in which there is a difference of
reading times on SOME, but not on THE REST - which could indicate an access to the PSI but not to
the SSI - but given that it is not the focus of their experiment, this question is not addressed by the
authors.
We will briefly present a second paradigm that has striking similarities with ours. In a study assessing
the computation of SI in autistic people, Hochstein & Barner (unpublished), use a Partial-Knowledge
task as a test of Epistemic Reasoning (Experiment 3). The principle of the paradigm is the following.
There are three boxes. The speaker can be either knowledgeable (knowing the content of all boxes)
or ignorant (knowing the content of only two out of the three boxes). A sentence is uttered:
“Some/Two/All of the boxes have strawberries”. The question asked is: “Do you think there are
strawberries in this box?” The case with the knowledgeable speaker enables to distinguish LIT (“I
don’t know” answers) and SI (“no” answers): this is the classical condition to test SI.
When the speaker is ignorant, the expected answer is “I don’t know”: subjects are not licensed to
make the implicature, as the speaker is not in a position to know whether all of the boxes have
strawberries. They find that in this last condition, autistic people tend to answer “no” for sentences
containing SOME, which they interpret as showing that they are more likely to compute SI in
incorrect contexts (i.e. without epistemic justification). However, their paradigm does not show that
autistic people have derived the SSI but only that they do not take into account the level of
information of the speaker when interpreting the sentence. Indeed, they do not test the ignorance
condition with “All” or “Only some”, which would enable to see if the incorrect answer is specific to
the SI computation, or if it is a general deficit in taking into account the speaker information level.
Their paradigm does not enable us either to distinguish between three different levels of reading.
The paradigm we shall present is very close to this one, but crucially differs on the question asked to
the participant.
17
1.3. “Scalar diversity”: to which linguistic expressions does the scalar enrichment
mechanism apply?
In the first part of this introduction, we have presented the debate between the Gricean and the
Grammatical accounts, which concerns the division of labor between grammar and pragmatics in SI
computation. An orthogonal question to this is the following: how general is this phenomenon?
Which linguistic expressions can be considered as scalar items?
Our third goal was to use our paradigm to inform this question, as a “diagnostic” tool for three
debated cases: NUMERALS, PLURAL, and ALMOST.
1.3.1. Not just testing SOME: why study a broader range of implicatures?
As pointed out by van Tiel & al (2014), the experimental literature on SI has mainly focused on the
example of SOME6. It is important not to limit our investigations to this paradigmatic case for at least
two reasons. First, studying other scalar items is a way to make our conclusions more generalizable
and to avoid the criticism that they could be explained by a specificity of the word SOME. Second,
experimental studies comparing the behavior of different scalar items have shown that they were
not equivalent in many respects: the rate of derivation can importantly differ from one scalar item to
another (van Tiel, 2014), and importantly, their processing characteristics have also been proved to
be different, as we will see below.
1.3.2. Informing theoretical debates on three expressions
As the case of SOME, often considered as the paradigmatic case of SI, has been extensively studied,
we will use it as a baseline. We are now going to present three cases that can be accounted for with
a theory of SI, but for which this approach is controversial.
1.3.2.1. Numerals quantifiers
There is an ambiguity for a sentence such as “n cards are hearts” (with, for example, n=2): it can
mean “at least two cards are hearts” or “exactly two cards are hearts.”. It has been proposed (Horn,
1972; see Spector, 2013, for a recent summary) to analyze numerals as a case of SI. Under this
account, the literal meaning is “at least n cards are hearts”. The scale to consider is <ONE, TWO,
6The disjunction (scale: <OR, AND>) has also been the target of many studies, but we shall not address this case here.
18
THREE, …>. (“At least n+1” asymmetrically entails “at least n”). You obtain the implicature, “exactly n
cards are hearts.”, by a strengthening mechanism akin to the one presented in 1.1.1.
The case of NUMERALS is one of the most studied cases after SOME. Actually, there is a great deal of
evidence suggesting that they behave differently. First, this has been shown studying the syntactic
distribution of the assumed “implicated” (“exactly”) reading, compared to SOME (Horn 1992;
Breheny, 2008). More recently, acquisition studies have shown that children acquired the NUMERALS
scale earlier than other scales: whereas children are known to acquire SI quite late, Papafragou &
Musolino (2003) found that 66% children aged of 5 y.o. accessed the strengthened reading for
NUMERALS, whereas only 12,5% did for SOME (see also Huang & Snedeker, 2009). Making the
assumption that this is explained by a problem in the access to scalar alternatives (see Barner &
Bachrach, 2010, Bale & Barner, 2013, a.o.), this result can be accounted for within a theory of
NUMERALS as SI, but it can also be taken to show that NUMERALS have an “exactly” lexical meaning.
Last, this conclusion was supported by processing studies using eye-tracking study (Huang and
Snedeker, 2009a) or dual-task paradigm: Marty & al. (2013) found that tapping memory resources
had opposite effects on SOME and on NUMERALS (under high cognitive load, participants derived
fewer SI for SOME, but more “exactly” readings for NUMERALS, suggesting that the “basic” meaning
of numerals is the “exactly” meaning).
1.3.2.2. Plural
The plural/singular distinction is the source of a long-standing debate in semantics. Intuitively, it
seems that the meaning of the plural morpheme is “strictly more than one” (Lasersohn, 1995).
However, some linguistic observations suggest that this definition is not sufficient: there are contexts
in which the plural morpheme can be interpreted as meaning “at least one”, e.g. under negation: a
sentence such as “There are no cards on the desk” will be considered false even if there is only one
card on the desk.
It has been proposed to treat the “strictly more than one” component of the meaning of the plural
morpheme as an implicature rather than an inherent part of its semantics (Sauerland, Anderssen &
Yatsushiro, 2005; Spector, 2007; Zweig 2008). According to this account, the literal meaning of the
plural morpheme is “at least 1”. Simplifying somewhat7, the scale to consider is <PLURAL,
SINGULAR> (SINGULAR meaning “exactly one”, thus asymmetrically entailing PLURAL). The
implicature is thus “at least 2”.
There are few experimental studies on this case. Pearson, Khan & Snedeker (2011) found that under
certain circumstances, it was possible to cancel the “more than 1” meaning component – one of the
7 To be more precise, the plural case can be viewed as a case of Higher Order Implicature (see Spector, 2007, for details).
19
hallmarks of implicatures -, even if this cancelation is quite difficult to obtain. This has also been
investigated in children (Tieu, Bill, Romoli & Crain, 2014).
1.3.2.3. Almost
Last, there is a debate on the semantics of the modifier ALMOST. Intuitively, the meaning of
“ALMOST X” (with for example X = ALL or X=NO) seems to be that “X is close to being true, but is in
fact not true”.
This “not X” part of the meaning of ALMOST has been analyzed as a case of SI (Sadock, 1981; Spector,
2014). The reasoning is similar to the reasoning for SOME: the literal meaning is “almost X and
possibly X”; the scale to consider is <ALMOST X, X> and the implicature is “almost X but not X”
(either the speaker would have said X).
However, this account of the meaning of ALMOST is far from being a consensus: in particular, the
“not X” part of the meaning has also be analyzed as an entailment (Hitzeman, 1992; Horn, 2011,
Kilbourn-Ceron, 2015).
Until now, there is no real experimental data on this question. Establishing the existence of the literal
reading for ALMOST would be a strong argument against the entailment view and supporting the SI
account.
1.3.3. Summing up
From a theoretical point of view, we can hold a consistent account of these three cases in term of SI.
One way to inform the theories is to confront them with the data: can we distinguish three levels of
reading – LIT, PSI and SSI – for NUMERALS, PLURAL and ALMOST? We aimed to investigate the extent
to which these scalar items revealed a similar behavior as SOME, considered as the paradigmatic case
of SI.
Importantly, the question of the differences between scalar items can be linked to the debate
concerning the nature of the mechanism that gives rise to SI: the mechanisms underlying these
three cases are not necessarily the same, and comparing their behavior is a way to inform the more
general debate presented in 1.1.
20
2. Work done during the internship
Before turning to the methodological part, here is a short summary of the work done during the
internship.
Our primary goal and guiding line was to find an experimental design enabling to establish the
existence of LIT, PSI and SSI for the most classical case of SI, SOME. This work was essentially
methodological. Two pilot experiments, presented in Appendix 2, enabled us to settle such a
paradigm. We then ran three sets of experiments. All were based on the same paradigm.
The first set of experiments consisted of two experiments.
The first one (Experiment 1a) is presented in Chapter III. It establishes the existence of three
readings: LIT, PSI and SSI, and further assesses the differences between scalar items: we show that
for SOME and ALMOST, we detect LIT, PSI and SSI, whereas for NUMERALS and PLURAL, we only
detect LIT and SSI. Chapter IV presents a control experiment we had to run afterwards (Experiment
1b), which replicated the results obtained in 1a.
Experiment 2, launched simultaneously with Experiment 1a, is presented in Appendix 3. It led to no
result, probably due to the number of cards presented (4 cards, whereas there were 8 cards in
Experiment 1). Its main goal was to see the effect of presenting the stronger alternative (sentences
containing ALL) on respectively the PSI and the SSI. The experiment was also partly driven by one of
the problems we had running the pilots: the rate of LIT was very high, which made it hard to
differentiate PSI and SSI; we thus explored different factors known to increase implicated readings,
presenting the stronger alternative being one of them. More details are given in appendix.
The last experiment is presented in Chapter V. It applies the dual-task methodology to the paradigm
established in Experiment 1. It had two main goals: first, studying the processing properties of the
three readings distinguished (a point we wanted to address in Experiment 1 studying Response
Times, but our experimental design was not well fitted to this purpose). The dual-task methodology
was interesting because it was more compatible with an online experiment, and it enabled us to have
more precise information on the processing. The second goal was to go further in the comparison
between scalar items: we wanted to see whether we replicated the differences obtained in
experiment 1 with a more fine -grained approach.
The chapter presents two experiments: Experiment 3a is a reduced version of the Experiment 1b,
and serves as baseline for Experiment 3b, which is the dual-task version of the experiment.
21
3. Experiment 1a: existence of primary and secondary
implicatures for different scalar items
3.1. Goal
The main goal of this experiment was to show that we could experimentally distinguish between
three readings for standard scalar items: the literal reading (LIT), the primary implicature (PSI) and
the secondary implicature (SSI). These three readings are summed up in table 1 below.
Our second goal was to show that the access to the SSI does not depend only on the Competence
Assumption: the SSI can be accessed even when the speaker is not fully informed.
Finally, we use this paradigm as a tool to investigate other cases that have been analyzed as SI:
NUMERALS, PLURAL and ALMOST. We compare their behavior to the behavior of SOME.
Table 1: Three readings (example of SOME)
LIT Literal reading B(some)
PSI Primary Scalar Implicature B(some) ¬ B(all)
SSI Secondary Scalar Implicature B(some) B (¬ all)
3.2. Method and materials
The experiment consisted in a truth-value judgment task. It was an online experiment, hosted on
Alex Drummond’s Ibex Farm. Participants were recruited via Mechanical Turk and were paid for their
participation.
Participants were presented with a picture constituted of two sets of eight cards. Each set of cards
represented the beliefs of a player, Peter or Mary, as shown in the example below:
We manipulated the information level of the players by putting some of the cards face-down. One of
the players was fully informed: he could see all of the cards. The other player was only partially
22
informed: some of his cards were presented face-down, with a question mark ? printed on it. It was
made clear that Mary and Peter were in front of the same cards but unequally informed.
A sentence was attributed to Mary or Peter: it was displayed on the right or on the left side of the
screen, depending on the speaker, in order to facilitate the matching between the sets of beliefs of
the speaker and the sentence. The speaker could thus be either fully informed (“Knowledgeable
Speaker”) or only partially informed (“Ignorant Speaker”).
Participants had to judge whether the speaker could or could not have said the sentence, according
to her informational state. Two answers were possible: “Mary/Peter can say that” or “Mary/Peter
cannot say that”. For the sake of simplicity, we will use “yes” and “no” to refer to these answers from
now on.
Importantly, with this type of judgment, as opposed to a bare true/false or yes/no judgment8, there
are two reasons for rejecting a sentence:
- the sentence is logically false (it does not match the actual world),
- the speaker does not have enough information to know whether the sentence is true or false9.
We presented both the cards seen by the speaker and the cards seen by another player in order to
control for the fact that the participant’s answer depended on the beliefs of the speaker, and not on
the actual situation. Thus, half of the information at each trial was in principle useless: the cards seen
by the other player were supposed not to influence the answer.
Experimental items
Each trial was composed of a sentence and a picture. We first describe the sentences tested, and
then the pictures associated with them.
Sentences
The sentence presented was always of the form “X cards are Y.”
X could be: SOME OF THE, TWO, SOME, ALMOST ALL, ALMOST NO, ALL, NO.
Y could be: heart(s), diamond(s), spades(s), club(s).
8 In pilot experiments, we first used another type of question: we directly asked “Can Mary/Peter say that?”. Participants had to choose between two (TRUE/FALSE) (pilot 1) or three possible answers (TRUE/FALSE/NOT ENOUGH INFORMATION) (pilot 2). The 2-answer version of the pilot was not sensible enough to detect SSI. The 3-answer version enabled us to detect SSI, but having three answers made the results less easy to interpret. See Appendix for details. 9 We could add to this a third reason: the sentence is true but the speaker believes that it is false. We have not used this
option.
23
Target sentences
Four scalar items were tested: SOME, NUMERALS, PLURAL and ALMOST.
The corresponding sentences were the following:
SOME “Some of the cards are [hearts].” 10
ALMOST “Almost all cards are [hearts].” ” Almost no card is a [heart].”
NUMERALS “Two cards are [hearts].”
PLURAL “Some cards are [hearts].” 11
Control sentences
Two types of sentences, which did not give rise to SI, were included as controls.
NO “No card is a [heart].”
ALL “All of the cards are [heart]s.”
Sentences with NO were included in order to check that the participants understood correctly the
meaning of the ? cards, and did not interpret them as representing another type of cards (“a card
which is not a heart”). The critical case was the following: when no heart is visible and some of the
cards are hidden, if the participant understands incorrectly the ? card, she will answer “yes” (“Peter
can say that”), whereas if she understands correctly the ?, she will answer “no” (“Peter cannot say
that.”). Participants who made more than 30% errors on these controls were removed from the
analysis. We also included control cases with NO that were clearly true and false.
Sentences with ALL were added to counterbalance the possible effect of NO on the rate of
implicatures for SOME.
Conditions and pictures
Conditions
The cards displayed with the sentence corresponded to four conditions: the picture could make true
(∅) no reading, (L) the literal reading only, (LP) the literal and the primary reading, or (LPS) the three
readings. We refer to these conditions using the initial of the readings(s) they make true.
10 SOME can combine with a Noun Phrase to form a partitive (“Some of the cards are hearts”) or a non-partitive construction (“Some cards are hearts”). We chose the partitive construction because it had been shown that it favored SI computation (Degen & Tanenhaus, 2011). 11 We have considered other sentences to test the PLURAL case:
(1) “There are hearts.”: With this sentence, it was not clear whether “hearts” referred to the card color or to the symbol on it: in target cases, participants could have answered “yes” because of seeing several symbols on the card.
(2) “There are cards which are hearts.”: This sentence was used in the pilot, but as it sounds quite unnatural to native speakers, we changed it.
24
∅ and LPS correspond to control conditions (∅: no controls ; LPS: yes controls); L and LP correspond
to target conditions.
- In L, the speaker is knowledgeable and knows that the stronger scalar alternative is true.
If the participant accesses LIT, she will answer “yes”.
If she accesses the PSI or the SSI, she will answer “no”.
This is the classical case used to test implicatures.
- In LP, the speaker is ignorant and does not know whether the stronger scalar alternative is true.
If the participant accesses LIT or PSI, she will answer “yes”.
If she accesses the SSI, she will answer “no”.
This new case enabled us to distinguish SSI from PSI.
The table below presents the correspondence between conditions and cards for SOME.
Condition
Control (no) Target Target Control (yes)
∅ L LP LPS
Speaker’s
Cards ♤ ♤ ♤ ♤ ♤ ♤ ♤ ♤
/ ♤ ♤ ♤ ♤ ? ? ? ? * ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ? ? ? ? ♥ ♥ ♥ ♥ ♤ ♤ ♤ ♤
The picture makes no
reading true The picture makes only
LIT true
The picture makes LIT and PSI true, but not
SSI
The picture makes the sentence true whatever
the reading
LIT NO YES YES YES
PSI NO NO YES YES
SSI NO NO NO YES
*: A sentence can be rejected (i) because the sentence is false and the speaker knows it or (ii) because the
speaker does not have enough information to say the sentence.
Table 2 presents the speaker’s cards for the four scalar items tested. We present only the target
conditions here: details are presented in Appendix 4.
SOME TWO PLURAL
ALMOST
ALMOST ALL ALMOST NO
L ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♤ ♤ ♤ ♤ ♥ ♤ ♤ ♤ ♤ ♤ ♤ ♤ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♤ ♤ ♤ ♤ ♤ ♤ ♤ ♤
LP ♥ ♥ ♥ ♥ ? ? ? ? ♥ ♥ ♤ ♤ ? ? ? ? ♥ ♤ ♤ ♤ ? ? ? ? ♥ ♥ ♥ ♥ ♥ ♥ ♥ ? ♤ ♤ ♤ ♤ ♤ ♤ ♤ ?
25
In the experiment, these four conditions were further divided into 7 conditions:
- ∅ condition was subdivided into:
o (1) Expected answer: “no” because the sentence is false and the speaker knows it.
o (2) Expected answer: “no” because the speaker does not have enough information,
(2a) the sentence is true according to the other player’s.
(2b) the sentence is false according to the other player’s.
- LP condition was subdivided into:
(3a) the sentence is true according to the cards of the other player.
(3b) the sentence is ambiguous according to the cards of the other player.
The comparison between (a) and (b) was a control to show that the cards of the other player did not
influence the answers.
Target cases represented 33% of trials for each type of sentence. There was the same number of
“yes” and “no” controls (see Appendix 5 for the detailed number of trials per condition).
For the control sentences with NO and ALL, as there were no SI associated, there were only 2
conditions (∅ and LPS). ∅ was subdivided into 3 conditions, following the principles explained above.
Pictures
The pictures consisted of a set of 16 cards (8 cards for each player). 8 of them really mattered for the
judgment. The color (spade/heart/diamond/club) and the rank (from 1 to King) of the cards were
chosen randomly using a program in Python. The color of the other cards was chosen to be easily
distinguishable from the color used in the sentence (if it was hearts, other cards could not be
diamonds; if it was spades, other cards could not be club). In half of the trials, Mary was the speaker;
in half of the trials, Peter was the speaker. Mary was always on the left side of the screen, and Peter
on the right side of the screen.
Procedure
The experiment was hosted on Alex Drummond's Ibex Farm. After having given their consent to
participate in the experiment, instructions were given to the participant (Detailed instructions and
training are given in Appendix 6). There was then a training (14 non ambiguous items with feedback),
and then the experimental phase (288 trials with no feedback). The four first sentences with no
feedback were taken from the training phase, in order to get the subjects used to the 'no-feedback'
version of the experiment, and were removed from the analysis. At the end of the experiment, there
was a short questionnaire, with information on age, sex, native language, kind of device used to
answer and Mechanical Turk Worker ID.
26
3.3. Participants
60 participants were recruited via Mechanical Turk. 59 of them completed the task. We removed
from the analysis 1 participant whose native language was not English, 5 that made more than 35%
(mean-2*standard deviation) errors on controls, and 1 who made more than 31% errors on NO-
controls. We thus present the results for 52 participants (36 females, 16 males, mean age: 41,8, from
20 to 62 year old).
3.4. Results
Data analyses were conducted using R. We used binomial linear mixed effects model, built with a
maximal random effect structure based on subjects and items as random variables, although we
sometimes had to step back to random-intercepts-only models when the model failed to converge
with the full random-effects specification (following Barr et al., 2013).
Analysis of responses
Data treatment: We removed the trials that were below 200ms and above 20000ms (less than 1%
of the data). We then removed, for each participants, the trials that were above and below mean+/-
2*standard deviation, keeping 94,9% of the data.
Globally, the mean of errors on controls was very low (All controls: 2,4%: LPS: 2,9%, ∅: 2,0%,).
Figure 1a shows the overall proportion of “yes” answers in the 4 conditions.
The mean of “yes” answers in L, corresponding to LIT alone, was 34,4%.
The mean of “no” answers in LP, corresponding to SSI alone, was 45,1%. (mean of “yes”:54,9%).
To obtain the rate of PSI, we compared the rates of answers in conditions L and LP, making the
assumption that the subjects were coherent in their readings. PSI thus represented 20,5% of the
readings.
We first checked that there was no significant difference between the different instantiations of the
condition ∅ and LP (as explained before, ∅ was instantiated 3 times: “no because the sentence is
logically false”; “no because the speaker does not have enough information, but the sentence is true”;
“no because the speaker does not have enough information and the sentence is true”; LP was
instantiated twice, the other player’s cards making the sentence true or ambiguous). This verification
showed that there was no difference between the two reasons why you could reject a sentence and
that the other player’s cards did not influence the answers. Indeed, there was no difference between
these conditions (see Figure 1b). In all following analyses, we made this verification, but we won’t
systematically report it.
27
Figure 1a – Proportion of “yes” answers (All scalar items)
4 conditions
Figure 1b– Proportion of “yes” answers (All scalar items)
Detailed conditions
Existence of the readings:
We used linear mixed model to predict answer (yes vs no), using Condition as fixed variable and
Subject as random factor. Three contrasts enabled us to distinguish between the three readings.
We first did the test for all scalar items, and then for each scalar item.
All scalar items confounded:
(1) First, we compared ∅ to L in order to detect LIT. There was a significant difference between
the rate of “yes” answers between the two cases (χ2 (1) =67, p<4.10-16 ***)
(2) We then compared L to LP in order to detect the PSI (χ2 (1) = 13, p<.001 ***)
(3) We finally compared LP to LPS in order to detect the SSI (χ2 (1) =70, p< 3.10-16 ***)
Effect of scalar item:
We ran the same three tests on each scalar item (see Figure 1c). The table below summarizes the
results.
∅ vs L - LIT L vs LP - PSI LP vs LPS - SSI
SOME χ2(1)=63.496, p=1.607e-15 *** χ2(1) =17.831 , p=2.414e-05 *** χ2(1)=8.1926, p=0.004206 **
ALMOST χ2(1)=10.968, p=0.0009269 *** χ2(1)= 21.554 , p=3.44e-06 *** χ2(1)=38.96 , p=4.326e-10 ***
TWO χ2(1)=7.0938 , p=0.007735 ** χ2(1) = 0.62, p=.4301 χ2(1)=43.612 , p= 4.003e-11 ***
PLURAL χ2(1) =13.949 , p=0.0001878 *** χ
2(1)=0.35, p= .557 χ
2(1)=57.998, p=2.624e-14 ***
28
For SOME and ALMOST, we detect the three levels of reading: the differences are significant for the
three tests. For TWO and PLURAL we detect LIT and SSI, but not PSI (TWO: χ2(1) = 0.62, p=.43 ; PLURAL:
χ2(1)=0.35, p= .56).
In order to check this difference of behavior between, on the one side, SOME and ALMOST, and on
the other side, TWO and PLURAL, we also tested the interactions between the different scalar items
for the PSI-test. Results are summed up below. As expected, the interaction is not significant in two
cases: SOME vs ALMOST and TWO vs PLURAL.
L vs LP ALMOST TWO PLURAL
SOME χ2(1)=0.0966, p=0.756 χ2(1)=32.34, p=1.291e-08 *** χ2(1)=39.547, p=3.20e-10 ***
ALMOST - χ2(1)=13.96, p=0.0001859 *** χ2(1)=18.11, p=2.085e-05 ***
TWO - - χ2(1)=1.1495, p=0.2836
Figure 1c – Proportion of “yes” answers by condition and scalar item
Readings by subjects
Figure 2 shows the responses of each subject combining the two target conditions L and LP, for each
scalar item tested. Each data point corresponds to a subject (a jitter was added in order to make the
results readable). Subjects that consistently access LIT are in the top right corner (they answer “yes”
in both cases). Subjects that derive the PSI but not the SSI are in the top left corner (they answer “no”
in LP and “yes” in L). Subjects that consistently access the SSI are in the bottom left corner (they
29
answer “no” in both cases). Crucially, we see that for SOME and ALMOST, there is nearly no subject
that answering “no” in LP but “yes” in L, whereas this is not the case for NUMERALS and PLURAL.
Figure 2 – Readings by subject
Response times (description)
Figure 3 shows the mean response time by condition. (We removed RT>10000ms).
Descriptively, it seems that our results replicate the
classical findings (Bott & Noveck, 2004): in L condition, the
mean response time for “no” answers is higher than for
“yes” answers (compared to the mean response times in
control conditions).
In LP, the pattern is reversed: answering “yes” takes
slightly more time than answering “no”. This does not
match the hypothesis of the Gricean account, which
predict a higher response time for “no” answers than for
“yes” answers. We will return to the possible explanations
for this result in the discussion.
Before turning to the discussion, we are going to present a control experiment we had to run for
experiment 1a.
Figure 3 – mean Response Times by
answer and condition
30
4. Experiment 1b: control experiment
4.1. Goal
A criticism could be made to our first experiment: all conditions presenting an ignorant speaker (ie
seeing ? cards) were either target condition (LP) or no-controls (∅). That meant that there was no
control condition in which the participant had to answer “yes” with an ignorant speaker12. In order to
control for this possible bias, we ran another experiment, perfectly identical to experiment 1, except
that we included a control condition where the subject had to answer “yes” even if the speaker was
ignorant.
4.2. Method and materials
The material was exactly the same as in experiment 1, except that we added a version of the LPS
condition (that corresponds to the “yes” control condition) in which the participant was ignorant.
(We will refer to it as “LPS-IGNO”, as opposed to “LPS-KN” for the previous version).
- For the sentences testing SOME and PLURAL, it corresponded to the cards ♥ ♥ ♥ ♤ ♤ ? ? ?
- For the sentences using TWO, ALMOST, ALL and NO, one needs to have all the information to
answer “yes”: it was therefore not possible to implement.
In order to increase the proportion of conditions in which the expected answer was “yes” despite
ignorance of the speaker, we also added sentences with FIRST and LAST (Example: “The first card is a
heart”, with ♥ ♥ ♤ ♤ ? ? ? ?). We will also refer to these conditions as “LPS-IGNO” condition. We also
added control trials corresponding to “LPS-KN”, “∅-IGNO”, and “∅-KN”.
Experiment 1b thus consisted in 320 trials. The proportion of controls (LPS and ∅) and target cases
was the same as in experiment 1. 20% of LPS were “LPS-IGNO”.
The procedure was exactly the same as in Experiment 1a.
4.3. Participants
60 participants were recruited via Mechanical Turk. 59 of them completed the task. We removed
from the analysis 2 participants whose native language was not English, 6 that made more than
46,3% (m-2sd) of errors on controls, and 6 who made more than 30,5% of errors on NO-control. We
thus preset the results for 45 participants (30 females, 15 males; mean age: 35,5, from 19 to 61 y.o.).
12 To be exact, it was the case in two of the examples given during the training phase. This seemed not strong
enough, however, to argue that the absence of a “LPS-IGNO” condition could not influence our results, creating a bias to answer “no” when the speaker was ignorant.
31
4.4. Results
Analysis of responses
Data treatment: We removed the trials that were below 200ms and above 20000ms (1,43% of the
data). We then removed, for each participants, the trials that were above and below m+/- 2sd,
keeping 94,8% of the data.
Figure 1a shows the proportion of “yes” in the 4 conditions, Figure 1b according to the detailed
conditions.
Figure 1a – Proportion of “yes” answers
(4 conditions)
Figure 1b– Proportion of “yes” answers
(detailed conditions)
As in experiment 1, the mean of errors on controls was very low (LPS: 5,4% (LPS-KN: 4,1 %, LPS-IGNO:
8,5%) ; ∅: 1,3% (∅-KN: 1,5 %, ∅-IGNO: 1,1 %)).
The mean rate of “yes” answers in L, corresponding to LIT alone, was 30 % (vs 34% in experiment 1a).
The mean rate of “no” answers in LP, corresponding to SSI alone, was 53%. (vs 45% in experiment
1a). PSI thus represented 23% of the readings.
As in experiment 1a, we first checked that there was no significant difference between the different
instantiations of the condition LP and ∅. In particular, we checked that there was no difference
between the two instantiations of the LPS condition (χ2(1)=0.64 , p=.42).
32
Existence of the readings
As in experiment 1, three tests enabled us to distinguish between the three readings. We first did the
test for all scalar items (see Figure 1a), and then for each scalar item (Figure 1c). The table below
summarizes the results.
∅ vs L - LIT L vs LP - PSI LP vs LPS - SSI
All quanti χ2(1)=54.806, p=1.33e-13 *** χ2(1)=17.744, p=2.528e-05 *** χ2(1)=62.792, p=2.298e-15 ***
SOME χ2(1)=20.301 , p=6.617e-06 *** χ
2(1)=21.292, p=3.943e-06 *** χ
2(1)=13.685 , p=0.0002162 ***
ALMOST χ2(1)=8.1686 , p=0.004262 ** χ2(1)=18.19, p=1.999e-05 *** χ2(1)= 44.808 , p=2.173e-11 ***
TWO χ2(1)=5.36 , p=0.0206 * χ
2(1)= 0.7503, p=0.3864 χ
2(1)=69.391 , p= < 2.2e-16 ***
PLURAL χ2(1)=18.122 , p=2.072e-05 *** χ2(1)= 2.1696, p=0.1408 χ2(1)=9 9.111 , p=0.002541 **
Figure 1c – Proportion of “yes” by condition and scalar item
We also tested the interactions between the different scalar items for the PSI-test: as in 1a, the
interaction is not significant in two cases (SOME vs ALMOST and TWO vs PLURAL).
L vs LP ALMOST TWO PLURAL
SOME χ2(1)=0.0878 , p=0.767 χ2(1)=19.18 , p=1.189e-05 *** χ2(1)=29.757, p=4.898e-08 ***
ALMOST - χ2(1)=7.9883 , p=0.004708 ** χ2(1)= 9.8767 , p=0.001674 **
TWO - - χ2(1)=8e-04 , p=0.977
33
Readings by subjects
Figure 2 shows the responses of each subject combining the two target conditions. As in experiment
1a, we see that for SOME and ALMOST, there is nearly no subject answering “yes” in L and answering
“no” in LP, whereas this is not the case for NUMERALS and PLURAL.
Response times
Figure 3a shows the mean response time by condition. Figure 3b shows the results for the eight
conditions. (We removed RT>10000ms). Descriptively, there is no difference between the mean
response times of “yes” and “no” answers in the L condition. However, when compared to the
controls, we find the same “cost” than in 1a: subjects are slower to answer “no” than “yes”.
Figure 2 – Readings by subject
Figure 3a – mean Response Times by answer
4.5. Conclusion
This control experiment showed that the presence of a “LPS-IGNO” condition did not influence or
results. First, there was no significant difference between the rate of correct answers in “LPS-KN” and
“LPS-IGNO”. Second, we still detect the SSI (53% of “no” in condition LP), which is even a higher rate
than in Experiment 1a (45%). Interestingly, Experiment 1b confirmed that Response Times were not
very reliable – or at least hard to analyze -: it was the only point on which we did not replicate
previous results.
34
Discussion for Experiment 1a and 1b
The aim of these experiments was twofold: first, we wanted to show that three levels of readings
existed, LIT, PSI and SSI, for standard scalar items (SOME); second, we wanted to compare the
behavior of three debated cases of Scalar Implicatures: ALMOST, NUMERALS and PLURAL.
Existence of the readings
For SOME, our results confirmed that three readings could be distinguished: LIT, PSI and SSI. This
means that for standard scalar items, the SI classically opposed to LIT in experimental studies can be
further decomposed into PSI and SSI, as predicted by the Neo-Gricean account of SI.
The existence of SSI per se had not been straightforwardly established previously (Chemla & Singh,
2013; for arguable attempts, see Grodner, unpublished). Or paradigm enales to dstngsh it from the
PSI: when the speaker is ignorant about the truth value of the stronger alternative, the sentence is
rejected if the participant accesses SSI (“the speaker believes that some but not all”), but accepted if
she accesses PSI or LIT.
Importantly, we show that the SSI can be accessed even when the speaker is presented as ignorant.
As explained in the Introduction, on standard Neo-Gricean accounts, the “Epistemic Step” (from PSI
to SSI) involves an assumption that the speaker is knowledgeable, called the “Competence
Assumption” (Sauerland, 2004). When this assumption is not warranted, i.e. when the listener knows
that the speaker does not have full knowledge, listeners should not compute SSI.
Let’s note that we controlled for the fact that participants may have judged the sentence according
to the actual situation (and not the beliefs of the speaker): even in cases where the sentence was
true according to the other player’s cards, participants rejected the sentence: this shows that they
accessed the SSI, and did not reject the sentence because it could be false.
Here, we show that participants can access the SSI reading whatever the information level of the
speaker. This does not mean that the level of information does not influence the likelihood of
deriving the SSI, as shown by Bergen & Grodner (2012), but challenges the strict Neo-Gricean
account on a specific point: the factors that play a role in the step from PSI to SSI. It is possible to
adapt the Gricean account, adding that the Epistemic Step does not rely (or, at least, not only) on the
Competence Assumption. Our results do not directly challenge the assumption that the computation
is incremental.
One could argue that our paradigm - a judgment on the appropriateness of a sentence - is not
ecological enough and does not represent a very naturalistic assessment of the computation of SI.
We tried to make the context not too remote from a “real life” situation (a card game with two
named players speaking), but it can still be argued that in “real life” situations, the SSI would never
35
be accessed in ignorant speaker conditions, because there is a qualitative difference between a real
human being and a fictional character.
The second new result is that PSI exist, and are distinct from SSI, as predicted by the Neo-Gricean
account. We do not have a condition in which only the PSI is true as for LIT or false as for SSI: the
result is based on the comparison between the rates of answer in two conditions, one in which the
speaker knows that the stronger alternative is true (L), and the other one in which the speaker is
ignorant regarding the truth value of the stronger alternative (LP). We make the assumption
(supported by previous studies) that subjects are coherent in their readings between the different
conditions, assumption that is further supported by looking at the readings by subjects.
One could argue that the effect is due to the effect of the speaker’s information level only: the
decrease of “no” answers between L and LP could be just due to the fact that you derive more SSI
when the speaker is knowledgeable. Let’s assume that PSI do not exist and that all “no” answers in
the L condition correspond to SSI readings. Two facts show that this criticism does not hold: first,
studying the pattern of answers by subjects, we see that nearly no subject derives the SSI but not the
PSI (almost all subjects answering “no” in the LP condition, i.e. accessing the SSI, also answer “no” in
the L condition), whereas there is a group of subjects accessing the PSI but not the SSI.
Second, and perhaps even more convincingly, the comparison of the behavior between scalar items
shows that the difference in the rate of “no” answers between the conditions L and LP cannot
depend solely on the level of information of the speaker: for NUMERALS and PLURAL, the difference
between L and LP in not significant. Let’s turn to this second issue.
Comparison between scalar items
Our second goal was to inform theoretical debates on three contentious cases that have been
analyzed in term of SI. Our result show that the relative proportions of LIT, PSI and SSI differ for
SOME, ALMOST, NUMERALS and PLURAL: whereas for SOME and ALMOST, there are three distinct
readings, for NUMERALS and PLURAL, there are only two distinct readings: LIT and SSI.
For ALMOST, our results strongly support an SI account (Sadock, 1981, Spector, 2012). First, we show
that a literal reading exist, which to our knowledge had never been established: a sentence such as
“ALMOST ALL cards are hearts” is accepted when all cards are hearts. This is challenging for an
account of the “not all” meaning component of ALMOST as a logical entailment (Hitzeman, 1992;
Horn, 2011; Kilbourn-Ceron, 2015). This conclusion is further supported by the striking similarities of
behavior between SOME and ALMOST: even if the rate of derivation differs (20% of LIT for ALMOST,
35% for SOME), which is not very surprising, given the study by van Tiel & al. (2013), we find the
36
same overall distribution of PSI and SSI. Let’s note that we tested two sentences for ALMOST
(ALMOST ALL and ALMOST NO): the same pattern shows up with the two sentences, the rate of
strengthened meaning being a little higher with ALMOST ALL than ALMOST NO.
Finally, the result for ALMOST strengthens the conclusion we draw for SOME, showing that the result
is likely to generalize to other items.
Regarding NUMERALS and PLURAL, we find a different pattern of answers, suggesting that there is
no PSI, or, at least, that PSI is less accessible. This suggests that the underlying mechanism is not the
same as for standard scalar items.
For NUMERALS, this new result adds to other differences already found between numerals and
standard scalar items, regarding syntactic distribution (Horn, 1992, Breheny, 2008), acquisition
(Papafragou & Musolino, 2003; Huang & Snedeker, 2009) and processing (Huang & Snedeker, 2009;
Marty & al., 2013), and challenges the traditional SI account proposed by Horn (1972).
For PLURAL, in the same way, our results suggest that the “strictly more than one” meaning
component is not obtained by the same mechanisms as standard scalar items. The parallel with
NUMERALS remains to be explored with other studies.
Processing properties
Still, both Gricean and Grammatical accounts can account for the bare existence of the readings, and
even for the observations by subjects. A way to inform the debate is to study their processing
properties.
In this experiment, we wanted to inform this question studying response times. Descriptively, it
seems that our results replicate the classical findings (Bott & Noveck, 2004): in L condition, there is a
delay associated with the computation of the implicature. In LP, however, the pattern is reversed:
answering “yes” takes slightly more time than answering “no”; but when compared to the control
conditions, there is no difference. This is not what is predicted by the Gricean account, where (if
anything) there should be a cost associated to the step from PSI to SSI. But this can be explained by
other factors: in particular, in LP, ? cards are displayed, which might make the answer “yes” harder to
get.
Given that this was an online experiment and that we had not implemented the adequate controls
(e.g. counterbalancing for the position of “yes” and “no” answers on the screen), we decided not to
go further in this analysis, and to use another methodology to assess the processing cost associated
with the derivation of PSI and SSI: a dual-task experiment.
37
5. Experiment 2: dual-task impact on primary and secondary
implicatures
5.1. Goal
The main goal of this experiment was to study the processing properties of the three levels of
reading established in Experiment 1. This was also a way to have a better understanding of the “cost”
traditionally associated to SI.
As in Experiment 1, our second goal was to compare the behavior of four scalar items: SOME,
ALMOST, NUMERALS and PLURAL.
The dual-task methodology was interesting for us in at least two respects: first, it was more
compatible with an online experiment than the study of Response Time; second, it could bring more
precise information on the processing, indicating whether memory resources are involved.
As its name suggests, in a dual-task experiment, participants are asked to perform two tasks at the
same time. The reasoning underlying is based on the working memory model, which we will only
briefly explain here for reasons of space (but see Baddeley 1992, Miyake & Shah 1999, Engle, 2002).
Crucially, the paradigm relies on the assumption that human executive cognitive resources (i.e.
working memory resources) are limited: introducing a second task reduces the resources available for
the first task, facilitating automatic responses and inhibiting analytic responses (de Neys, 2006).
Studying SI, this could be a way to single out the “basic” meaning of a sentence, by blocking potential
strengthening mechanisms.
From a methodological point of view, the paradigm is based on the comparison between two
conditions of Cognitive Load, a factor characterizing the degree to which working memory resources
are burdened: a LOW-CL condition (with an easy second task), a HIGH-CL condition (with a harder
second task). We also added a NO-CL condition (baseline, without the second task).
Four studies have tested SI using a dual-task paradigm (De Neys & Schaeken, 2007, Dieuassert & al,
2011 ; Marty & Chemla, 2011; Marty & al., 2013). These four studies have shown that, for sentences
containing SOME, participants derive less SI as the second task became harder. This was a way to
confirm, in line with other studies using Response Time, self-paced reading or visual-world paradigm,
that SI were “costly” as compared to LIT, i.e. were not generated automatically as proposed by
Levinson (2000). Moreover, the dual-task methodology is, in a way, more precise than Response
Time studies: it indicates whether the cognitive effort associated with the processing involves central
working memory resources, whereas the conclusion we can draw from Response Time studies is that
SI are derived later than LIT, which does not characterize the nature of the resources involved.
38
In these four studies, PSI and SSI are confounded. Using our paradigm was also a way to understand
at which level of SI processing working memory resources were specifically involved.
The second goal of this experiment was, as in Experiment 1, to compare the behavior of four scalar
items. Interestingly, Marty & al (2013) found that the effect of the dual task was reversed for
NUMERALS and SOME: participants accessed more the “exactly n” reading (“implicated” reading)
under high cognitive load, whereas for SOME, they accessed more the “some and possibly all”
reading (“literal” reading). This result also drew us to choose this methodology. We used nearly
exactly the same paradigm, which allowed us to check that we could replicate - and then extend -
their results.
5.2. Method and materials
The experiment was an online experiment, hosted on Alex Drummond’s Ibex Farm. Participants were
recruited via Mechanical Turk and were paid for their participation.
Participants had to do two tasks at the same time:
- a truth value judgment task, identical to the task implemented in Experiment 1.
- a letter memory task, very similar to the task implemented by Marty & al (2013).
5.2.1. Truth Value Judgment Task
As in Experiment 1, each trial consisted of a picture and a sentence.
Sentences
The sentence was always of the form “X cards are [hearts]”.
As in experiment 1, four scalar items were tested:
SOME “Some of the cards are [hearts].”
TWO “Two cards are [hearts].”
PLURAL “Some cards are [hearts].”
ALMOST “Almost all cards are [hearts].” / “Almost no card is a [heart].”
We also included controls sentences with NO and ALL.
Pictures
Each picture was composed of two sets of eight cards. Each set of cards corresponded to the beliefs
of a player, Peter or Mary. We manipulated the information level of the players by putting some of
the cards face-down, with the symbol ? on them.
At each trial, a sentence was attributed to one of the players. The participant had to judge whether
the speaker could or could not have said the sentence given her informational state.
39
The cards displayed corresponded to four conditions, depending on the readings they made true:
Condition
Control Target Target Control
∅ L LP LPS
Speaker’s
Cards
♤ ♤ ♤ ♤ ♤ ♤ ♤ ♤
/ ♤ ♤ ♤ ♤ ? ? ? ? * ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ? ? ? ? ♥ ♥ ♥ ♥ ♤ ♤ ♤ ♤
LIT NO YES YES YES
PSI NO NO YES YES
SSI NO NO NO YES
∅ and LPS are control conditions; L and LP are target conditions. As in experiment 1b, they
corresponded to 8 actual conditions, depending on the cards of the other player.
5.2.2. Letter Memory Task
The memory task was a short term storage task of sequences of letters, based on the task
implemented by Marty & al.
Before the Truth Value Judgment Task, a sequence of letters was shown to the participants. The
letters were presented one after the other for 800ms, with 50ms pause between them. They were
displayed in the center of the screen, in black, in upper case.
The sequences were generated randomly using a program in Python. We used 9 letters: B, F, H, J, L,
M, Q, R and X (chosen to be phonologically distinct).
After the Truth Value Judgment Task, participants had to give back the sequence of letters in reverse
order.
They were given feedback at the end of the trial: either “Correct”, displayed in green in the center of
the screen, or “Wrong”, displayed in red, with an error message (ex. “You typed DL and the correct
answer was LF.”)
The Cognitive Load was manipulated by varying the length of the sequence of letters: 2 letters in the
LOW-CL and 4 letters in the HIGH-CL condition. Memory resources were supposed to be more heavily
taped in the HIGH condition. The cognitive load was manipulated within subjects: each participant
performed the LOW-CL as well as the HIGH-CL task. We also implemented a control experiment, with
no dual task, as a baseline.
40
Participants in the dual-task version of the experiment were administered two blocks of 92 trials,
with a short break between them: one block contained LOW-CL trials, and the other block contained
HIGH-CL trials. The order of the blocks was randomly determined for each participant. In each block,
the order of items and the correspondence between the sequence of letters and the truth-value
judgment task item was generated randomly.
Due to the difficulty of the dual task, we had to reduce the number of trials from Experiment 1b (320
trials). Given that our previous results suggested that the presence of a “LPS-IGNO” condition did not
strongly influenced the judgments, we removed sentences with FIRST and LAST, but kept the
subdivision of LPS into a KN and a IGNO condition. Moreover, in order to reduce the number of trials,
we removed half of the Ø and LPS conditions (i.e. control trials). The proportion of target cases in the
experiment was thus higher than in Experiment 1b: there was 25% ∅, 25% L, 25% LP, 25% LPS (i.e.
50% of target cases). There were 184 trials in total. (See Appendix 5 for the exact distribution of
conditions.)
5.2.3. Procedure
The experiment was an online experiment, hosted on Alex Drummond's Ibex Farm.
After having given their consent to participate in the experiment, participants were given instructions
concerning the Truth Value Judgment Task only. There was then a first training (4 non ambiguous
sentences with feedback). Then, participants were given the second part of the instructions,
concerning the Letter Memory Task. There was a second training with the Memory Task, on the 4
same sentences as before. The sequences were composed of 2 or 4 letters depending on the block
they started with.
The experimental phase was divided in two blocks, according to the cognitive load (LOW-CL vs HIGH-
CL). Participants were asked to make a short break between them. The two first sentences after the
break were taken from the training phase, in order to get the subjects used with the new number of
letters to memorize.
Each trial started with the presentation of the sequence of letters. Then, the Truth Value Judgment
task was displayed. It remained until the participant answered. Next, participants had to reproduce
the sequence of letters in reverse order. Last, they were given feedback on the accuracy of their
answer (see Figure 2).
At the end of the experiment, there was a short questionnaire (information on age, sex, native
language, kind of device used to answer, Mechanical Turk Worker ID, and kind of strategy used to
memorize the letters). This last question was included in order to control for the fact that
participants may have written the letters, as it was an online experiment.
41
For the no dual task version of the experiment, the procedure was exactly the same, except that the
instructions and the final questionnaire were adapted to the task.
5.3. Participants
59 participants were recruited via Mechanical Turk for the dual-task version of the experiment, and
61 for the no-dual task version of the experiment.
In the dual-task version, we had to remove 4 participants due to a problem loading their data, 3 who
made more than 25% of errors on NO-controls, and 1 who indicated that he had written the letters
for the memory task. All participants reported that English was their native language. In the no-dual
task version, we removed from the analysis 3 participants whose native language was not English, 2
that made more than 19% (m-2sd) of errors on controls, and 2 who made more than 25% of errors
on NO-controls. We thus had 51 participants for the dual task version of the experiment (24 LOW-
HIGH, 27 HIGH-LOW), and 54 participants for the no dual task version (58 females, 47 males, mean
age: 39,7, from 19 to 65 year old).
Figure 2: Description of the dual-task procedure
(4) Feedback on Letter Memory Task
time
(3) Letter Memory Task
( 2) Truth Value Judgment Task
(1) Presentation of the sequence of letters
(2 or 4 letters)
42
5.4. Results
Data analyses were conducted using R. We used binomial linear mixed effects model, built with a
maximal random effect structure based on subjects and items as random variables, although we
sometimes had to step back to random-intercepts-only models when the model failed to converge
with the full random-effects specification (following Barr et al., 2013).
5.4.1. Letter Memory Task
The mean rate of correct answers on the Memory Task was overall quite high (89%). As expected,
there was a significant effect of the Cognitive Load condition (see figure 3a): participants made more
errors on HIGH load trials than on LOW load trials (HIGH: 83,3% (SD:.03) vs LOW: 94,1 % (SD:.01)
(χ2(1)=46, p=1.178e-11***) . This confirmed that the 4-letters sequences were more demanding
than the 2-letters sequences.
There was no effect of the order of blocks (χ2 (1) = 0.13 , p= .72) (see Figure 1.)
Tradeoff analysis:
We further tested whether there was an effect of the answer on the Truth-Value Task on the
Memory Task results. Indeed, the cost associated with the SI computation can show up on the rate of
correct answers on the Memory Task. This corresponded to the following hypothesis; in condition L
and LP, there should be more errors if the participant previously answered “no” (i.e. the implicature
was derived) than if he previously answered “yes”; in contrast, in control conditions, the previous
answer should not affect the rate of errors.
Figure 1 – Proportion of correct answers
(Memory Task)
(by Cognitive Load condition and order of blocks)
Figure 2 – Proportion of correct answers
(Memory Task)
(by Condition and Answer on the TVJT)
43
The results were not significant whatever the Cognitive Load (LOW, HIGH or merged): there was no
effect of the answer given on TVJT (condition L, all scalar items confounded, all Cognitive Load
Condition: χ2 (1) = 0.3227, p = 0.57; condition LP: χ2 (1) = 0.3852, p= 0.5348). There was no effect
either when tested by scalar item. This means that there was no tradeoff between the two tasks.
5.4.2. Truth value judgment task
First, we checked whether we replicated the findings of Experiment 1 for the no-dual-task version of
the experiment (see Figure 3a):
NO DUAL-
TASK ∅ vs L - LIT L vs LP - PSI LP vs LPS - SSI
All items χ²(1)= 66.113, p=4.259e-16 *** χ²(1)= 32.435, p=1.233e-08 *** χ²(1)= 126.88, p=< 2.2e-16 ***
SOME χ²(1)= 15.333, p=9.012e-05 *** χ²(1)= 30.836, p=2.807e-08 *** χ²(1)= 12.692, p=0.0003671 ***
ALMOST χ²(1)= 11.826, p= 0.000584 *** χ²(1)= 32.23, p=1.369e-08 *** χ²(1)= 20.93, p=4.763e-06 ***
TWO χ²(1)= 10.087, p=0.001493 ** χ²(1)= 8.0943, p= 0.00444 ** χ²(1)= 33.47, p=7.239e-09 ***
PLURAL χ²(1)= 27.702, p=1.415e-07 *** χ²(1)= 1.4909, p= 0.2221 χ²(1)= 10.066, p=0.00151 **
We replicated the results of Experiment 1, except that for NUMERALS, the difference between L and
LP was now significant (χ²(1)=8.0943,p=0.00444**): we detected the PSI. The interaction between
TWO and PLURAL was significant (χ²(1)=4.2033, p=0.04035*), but it was also significant between
TWO and SOME (χ²(1)=27.134, p=1.899e-07***) and between TWO and ALMOST (χ²(1)= 15.80,
p=7.037e-05***). The interaction between SOME and ALMOST was, on the contrary, not significant
(χ²(1)=0.6958, p=0.4042).
Figure 3a –Truth value judgment task
(No dual-task experiment)
Figure 3b – Truth value judgment task
(Dual-task experiment)
44
Dual-task experiment:
In subsequent analyses, trials with incorrect answers on the Memory Task were removed (about
11,3% of the trials).
First, we conducted the same analysis on the dual-task version of the experiment (see Figure 3b):
WITH
DUAL-TASK ∅ vs L - LIT L vs LP - PSI LP vs LPS - SSI
All quanti χ²(1)= 53.132, p=3.119e-13 *** χ²(1)= 18.674, p=1.551e-05 *** χ²(1)= 45.93, p=1.226e-11 ***
SOME χ²(1)= 19.035, p=1.284e-05 *** χ²(1)= 18.253, p=1.934e-05 *** χ²(1)= 28.977, p=7.324e-08 ***
ALMOST χ²(1)= 8.7863, p= 0.003035 ** χ²(1)= 22.338, p=2.286e-06 *** χ²(1)= 9.943, p= 0.001615 **
TWO χ²(1)= 50.953, p=9.462e-13 *** χ²(1)= 3.4403, p= 0.06362 χ²(1)= 23.617, p=1.175e-06 ***
PLURAL χ²(1)= 55.567, p=9.033e-14 *** χ²(1)= 0.6822, p=0.4088 χ²(1)= 12.692, p=0.0003671 ***
We perfectly replicated findings of Experiment 1 regarding the existence of readings and the
difference between scalar items. Note that the result for TWO regarding the PSI is nearly significant.
Effect of Cognitive Load
Figure 4 shows the effect of Cognitive Load (comparing NO Cognitive Load (baseline), LOW Cognitive
Load and HIGH Cognitive Load) for each scalar item, depending on the condition.
For SOME and ALMOST, our first hypothesis was that the proportion of LIT would increase with
Cognitive Load, following results of previous studies. For NUMERALS, we expected the reverse
pattern, as found by Marty: the proportion of LIT would decrease when the Cognitive Load is higher.
We had no precise expectation concerning PLURAL.
Following the simplest Gricean account (assuming that the effect is mostly due to the retrieval and
manipulation of the alternative sentence), PSI and SSI should be impacted by the Cognitive Load in
the same way.
First, we tested the effect of having a dual task on the 3 readings, comparing the no-dual-task (NO
CL) to the dual-task version of the experiment (merging HIGH and LOW CL). We tested the interaction
between the condition and the version of the experiment for each scalar item:
NO CL vs CL ∅ vs L - LIT L vs LP - PSI LP vs LPS - SSI
SOME χ²(1)= 0.0195, p=0.889 χ²(1)= 2.7219, p=0.09898 χ²(1)= 0.4567, p=0.4992
ALMOST χ²(1)= 10.06, p=0.001515 ** χ²(1)= 0.1736, p=0.6769 χ²(1)= 0, p= 0.9969
TWO χ²(1)= 2.8795, p= 0.08971 χ²(1)= 0.0028, p=0.9581 χ²(1)= 0.365, p= 0.5457
PLURAL χ²(1)=0.0299, p= 0.8626 χ²(1)=0.1648, p=0.6848 χ²(1)= 6.2872, p= 0.01216 *
45
The effect of having a dual task is significant for ALMOST on LIT and for PLURAL on SSI. It is nearly
significant for TWO on LIT and for SOME for PSI (although before correction for multiple
comparisons). No general pattern emerges and all we can say is that the effect for ALMOST on LIT
goes in the same direction as the effect that was documented for SOME: there are more LIT readings
in the dual-task version of the experiment.
Figure 4 – Effect of Cognitive Load on the answer, depending on the scalar item
46
We then tested the effect of the level of Cognitive Load (HIGH vs LOW):
LOW CL vs
HIGH CL ∅ vs L - LIT L vs LP - PSI LP vs LPS - SSI
SOME χ²(1)= 3.044, p=0.08104 χ²(1)= 0.1062, p=0.7445 χ²(1)=0.0959, p=0.7568
ALMOST χ²(1)= 2.3602, p=0.1245 χ²(1)= 0.8089, p=0.3684 χ²(1)=0.0014, p=0.9702
TWO χ²(1)= 4.2628, p= 0.03896 * χ²(1)= 0.9796, p= 0.3223 χ²(1)=1.0443, p=0.3068
PLURAL χ²(1)=0.0729, p=0.7872 χ²(1)= 0.9437, p= 0.3313 χ²(1)=0.0333, p=0.8552
The results are not significant for PSI and SSI, whatever the scalar item tested. However, it turns out
to be significant for LIT for TWO (before correction for multiple comparisons though). Contrary to our
hypothesis and to what was obtained by Marty & al., this result corresponds to an increase of LIT for
NUMERALS. Moreover, the effect of Cognitive Load is reversed for SOME (even if it is not significant):
the rate of literal answers decreases when the Cognitive Load is higher.
5.5. Discussion
This experiment had two main goals: first, testing the effect of cognitive load on the different
readings established in Experiment 1, as a way to help localizing the “cost” observed for SI
computation in previous experiments; second, comparing the behavior of four scalar items, by
studying them on a new dimension.
As we did not replicate the findings of previous studies regarding the effect of Cognitive Load on the
computation of SI (De Neys & Schaeken 2007, Marty & Chemla 2011, Marty & al., 2013), and
statistical evidence being arguably weak (especially when “multiple comparisons” are taken into
account), few conclusions can be drawn from the experiment. Even in condition L, which corresponds
to the classical condition to test SI, the difference between HIGH and LOW-CL was not significant,
except for TWO. Unexpectedly, for TWO, this corresponded to more “literal” readings under higher
cognitive load, whereas the pattern was reversed – even if not significant - for ALMOST and SOME:
this is the opposite of what Marty & al. had found.
Comparing the comparison of the dual-task version to the baseline with no dual-task, the results go
in the expected direction, even if they overall turn out not to be significant (except for ALMOST):
whatever the scalar item, there is more LIT in the dual-task version of the experiment. Comparing a
dual-task and a no-dual-task experiment does not directly inform on the involvement of memory
47
resources, and this cost could be due, for example, to the effect of switching between two tasks
instead. It remains to be discussed how that type of interference could affect the derivation.
There are several explanations for the fact that we do not replicate previous results:
First, it is possible that our task was too easy: even if the results on the Letter Memory Task alone
showed that 4-letters-sequences were more demanding than 2-letters-sequnces, the overall rate of
correct answers on the Memory Task is quite high. The difference between the two levels of
Cognitive Load was perhaps not strong enough.
Second, contrary to Marty & al, we ran an online experiment, which meant that there were factors
we did not control for. Among others, participants may have written the letters (we removed the
participant who explicitly indicated that he used this strategy, but cannot be sure that other
participants did not: we regard this as being not very likely, but this is an example of what we cannot
physically control with online experiments). Another factor that differed between our experiment
and Marty & al.’s was that the linguistics task: they used a graded judgment task, possibly more
difficult than ours.
As we failed to replicate the result of previous studies, it was difficult to compare the cost for PSI and
SSI, which was the primary goal of the experiment. Nevertheless, our results strengthen the findings
of Experiment 1: we still detect three distinct readings for SOME and ALMOST, and only two for
PLURAL, even with a dual-task.
We might be worried by the fact that for NUMERALS, we detect the PSI in the no-dual-task version of
the experiment. This is not the case in the dual-task version, however it is nearly significant. The
comparison suggests that even if PSI exist for NUMERALS, it is hard to access. How can we explain
the fact that we detect PSI in this version of the experiment? It may be due to the fact that in this
version of the experiment, we had to modify the relative proportion of target and filler sentences: 1
out of 2 were fillers. As argued by De Neys and Schaeken (2007), adding more target sentences can
automatize the strengthening process: it offers more opportunities to come to a strengthened
interpretation, and repetition helps to make a process cognitively less demanding. If the PSI depends
on pragmatic mechanisms, this can account for the result: modifying the proportion of target can
make it more easily accessible. However, the same result is not obtained for PLURAL, suggesting that
this may not be the only factor involved.
48
6. General discussion
Summary of the results
Most experimental studies of SI are based on the distinction between the literal meaning (LIT) and
the implicated meaning (SI). In this work, we have established the existence of three levels of
readings, LIT, PSI and SSI, for standard scalar items as SOME.
This more fine-grained approach of SI enabled us to inform three specific debates, applying our
paradigm to three scalar items: ALMOST, NUMERALS and PLURAL. We compared their behavior to
SOME, considered as a “standard” scalar item, regarding the existence of three distinct readings. This
was a way to know to what extent they could be analyzed as cases of scalar implicatures.
Regarding ALMOST, we detected three distinct readings, as for SOME. This is a new result which
strongly supports the SI account of this expression over others. The result further strengthens the
conclusion for SOME, making it more generalizable. Extending this to other scalar items would be
highly interesting.
Regarding NUMERALS and PLURAL, we only distinguish LIT and SSI with this paradigm. For
NUMERALS, this difference adds to previous findings suggesting that NUMERALS depart from
standard cases of SI. Our result does not go directly against the idea that the “exactly” reading (the
assumed strengthened meaning) would not derive from the “at least” reading (the assumed literal
reading), but shows at least that the mechanisms underlying the computation are different for
NUMERALS and SOME. For PLURAL, the conclusion is similar: this deserves more investigation.
Importantly, our main results have been replicated across four experiments. There are small
variations in the rates of derivation across experiment (see Appendix 8 for a comparison between the
four experiments), but the overall pattern is robust. Factors accounting for the small variability are
diverse: subjects, other sentences tested in the experiment, relative proportion of fillers and targets,
etc.
Consequences for the theories
Let’s turn now to a more general and perhaps more central debate: the mechanism by which SI are
generated. Importantly, both the Gricean and the Grammatical accounts can predict the existence of
three levels of reading. Nonetheless, our results challenge the Gricean account on a specific point: we
show that SSI can be accessed even when the speaker is presented as ignorant, which suggest that
the “Epistemic Step” (from PSI to SSI) (Sauerland, 2004; van Rooij & Schultz, 2004) does not depend
49
solely on the Competence Assumption. According to the Gricean account, SI computation is mostly
pragmatic in nature, i.e. rests on a reasoning that takes into account the mental states of the speaker
– which includes, among others, her informational state -: it is difficult to understand why this SSI
could be derived when the speaker cannot be assumed to have an opinion on the truth value of the
stronger alternative.
Opened questions
A way to better inform the debate between these two accounts and to strengthen our conclusions
will be to study the processing properties of the readings, an important work that remains to be
done. Indeed, establishing their psychological reality does not directly enable to understand how
these three readings are related. Orthogonally, having a more fine grained approach of SI
(distinguishing PSI and SSI) will help to better understand the “cost” associated to the computation of
SI. One way to achieve this could be to study response times in a more controlled way, using a
training, as in the paradigm implemented by Bott & Noveck (2004) or Cremers & Chemla (to appear).
To conclude, even if of the guiding lines of our work was to inform the current debate between the
Gricean and the Grammatical accounts, our results have an interest beyond theoretical debates: they
put constrains on what these theories should or should not predict.
50
Appendices
1. Abbreviations
These are the main abbreviations used:
Readings
SI Scalar Implicatures
LIT Literal Reading
PSI Primary Scalar Implicature
SSI Secondary Scalar Implicature
(S) a sentence containing a scalar item. Ex: “Some cards are hearts.”
(S’) the stronger alternative to this sentence. Ex: “All cards are hearts.”
B ( x ) The speaker believes that (x)
Names of conditions:
with 4 conditions
∅ the picture makes no reading true
L the picture makes LIT true
LP the picture makes LIT and PSI true but not SSI
LPS the picture makes LIT, PSI and SSI true
with detailed conditions
Controls conditions
∅-KN Expected answer: “no”. The speaker is fully informed.
∅-IGNO-t Expected answer: “no”. The speaker is partially informed. The sentence is actually true.
∅-IGNO-f Expected answer: “no”. The speaker is partially informed. The sentence is actually false.
LPS-KN Expected answer: “yes”. The speaker is fully informed.
LPS-IGNO Expected answer: “yes”. The speaker is partially informed.
Targets conditions
L Expected answer: “yes” if LIT, “no” if PSI or SSI
LP–t Expected answer: “yes” if LIT or PSI, “no” if SSI. The sentence is actually true.
LP-a Expected answer: “yes” if LIT or PSI, “no” if SSI. The sentence is actually ambiguous.
51
2. Pilot Experiment
Method and material
The experiment was a truth-value judgment task. At each trial, four cards and a sentence were
presented to the participant. We manipulated the level of information of the speaker by putting
some of the cards face-down. The question asked was: “Can Peter say that?”.
We tested two versions of the experiment:
(a) with two possible answers: TRUE / FALSE
(b) with three possible answers: TRUE / FALSE / NOT ENOUGH INFORMATION
Sentences
Three scalar items were tested: SOME “Some of the cards are hearts.”
NUMERALS “Two/ Three cards are hearts.”
PLURAL “There are cards which are hearts.”
Conditions
Four conditions were tested: ∅: the sentence is false whatever the reading
L: the sentence is true with LIT only
LP: the sentence is true with LIT and PSI but not with SSI
LPS: the sentence is true whatever the reading
∅ and LPS corresponded to control conditions, L and LP to target conditions.
Conditions are named after the initial of the reading(s) the picture makes true.
The expected answers for the conditions are summed up in table 1 below.
Example: Mary: “Some of the cards are hearts.”
Conditions
Control conditions Target conditions
∅ (i) ∅ (ii) LPS L LP
Cards displayed
♤ ♤ ♤ ♤ ♤ ♤ ? ? ♥ ♥ ♤ ♤ ♥ ♥ ♥ ♥ ♥ ♥ ? ?
LIT FALSE NOT ENOUGH
INFORMATION TRUE TRUE TRUE
PSI FALSE NOT ENOUGH
INFORMATION TRUE FALSE TRUE
SSI FALSE NOT ENOUGH
INFORMATION TRUE FALSE
NOT ENOUGH
INFORMATION
NB: the ∅ control condition is divided into two sub-conditions depending on the reason why the
sentence can be rejected: (i) because the sentence is false or (ii) because the speaker does not have
enough information to say that. In the 2-answers version of the pilot, the expected answer for both
(i) and (ii) was FALSE ; in the 3-answers version of the pilot, the expected answer for (i) and (ii) was
respectively FALSE and NOT ENOUGH INFORMATION.
52
Table 2 below presents the cards displayed in each condition depending on the scalar item tested:
Conditions Control conditions Target conditions
∅ (i) ∅ (ii) LPS L LP
SOME ♤ ♤ ♤ ♤ ♤ ♤ ? ? ♥ ♥ ♤ ♤
/ ♥ ♥ ♥ ♤ ♥ ♥ ♥ ♥
♥ ♥ ? ? / ♥ ♥ ♥ ?
*
TWO ♤ ♤ ♤ ♤ ♤ ♤ ? ? ♥ ♥ ♤ ♤
♥ ♥ ♥ ♥
/ ♥ ♥ ♥ ♤
/ ♥ ♥ ♥ ? ♥ ♥ ? ?
THREE ♤ ♤ ♤ ♤ ♤ ? ? ? ♥ ♥ ♥ ♤ ♥ ♥ ♥ ♥ ♥ ♥ ♥ ?
PLURAL ♤ ♤ ♤ ♤ ♤ ♤ ? ? ♥ ♥ ♤ ♤ ♥ ♤ ♤ ♤ ♥ ? ? ?
* This case enabled to see whether the likelihood of the sentence with ALL being
true could increase the rate of derivation of the implicature (when 1 card is
hidden, it is 25% likely that All cards are hearts ; when 2 cards are hidden, it is
6,25% likely that All cards are hearts). Results showed that this had no effect.
Results
31 participants were recruited via Mechanical Turk for the 3-answers version of the experiment, and
28 for the 2-answers version. Results for the target conditions are presented below.
In the 2-answers version, we failed to detect SSI (there was no answer “false” in condition LP).
In the 3-answers version, we detected SSI (answers “not enough information” in condition LP).
Overall, the rate of LIT was high (“yes” answers in condition L). That made it difficult to really
differentiate PSI from SSI. We thus tried, in the improvements of the pilot, to find ways to increase
the rate of implicated readings (ie to diminish literal readings). We finally found that a critical factor
was the number of cards presented: with eight cards, there are more SI and less LIT readings.
SOME NUMERALS PLURAL
2-an
swe
rs
3-an
swe
rs
♥: card of the target color
♤: card of another color
?: ignorance card
53
3. Experiment 2: testing the influence of presenting the stronger alternative
One of the problem of our pilot was that the rate of LIT that was quite high (i.e. SI quite low), which
made it difficult to really differentiate PSI from SSI.
In particular, one factor known to increase the rate of SI is to present sentences containing the
stronger alternative (i.e. testing the scalar item SOME, sentences with ALL). This has been shown by
Grodner (unpublished): in a truth value judgment task similar to Bott & Noveck (2004), he showed
that increasing the proportion of ALL (vs NO) in fillers was monotonically related to generating the SI.
Besides, the real goal of the experiment was to test whether this manipulation impacted in the
same way the rate of PSI and SSI. According to the Gricean account, the generation of the
alternative is involved at the first step of the computation, between LIT and PSI. The manipulation of
the stronger alternative, whatever the exact mechanism involved, impacts the ease with which
alternatives are accessed. According to the Gricean schema, this manipulation should therefore
affect the rate of PSI only, and not the rate of SSI.
Results: As the rate of LIT was too high, we did not distinguished between PSI and SSI, thus the effect
of presenting the alternative was uninterpretable. We found afterwards that it was likely linked, as in
the pilot, to the number of cards presented (4 cards, whereas there were 8 cards in experiment 1).
Method and materials
Except for the number of cards and the sentences tested, the paradigm was exactly the same than in
experiment 1. Only SOME was tested.
Sentences
The sentence was always of the form “X cards are [hearts]”.
X could be: “some”, “all”, “some or all”, “no”.
In experiment 2a, we tested only sentences containing SOME (60 sentences).
There were also control sentences with N in order to check that the participants understood
correctly the meaning of the ? cards (12 sentences).
In experiment 2b, we presented sentences containing SOME (60 sentences) and sentences
containing its scalar alternative ALL (60 sentences).
There were also control sentences with NO (12 sentences).
In experiment 2c, we presented sentences containing SOME (60 sentences) and sentences containing
its SOME OR ALL13 (60 sentences).
There were also control sentences with NO (12 sentences).
13 The hypothesis concerning the effect of presenting SOME OR ALL was the same as for ALL: it was supposed to increase
only the PSI readings. From a theoretical point of view, the expression SOME OR ALL forces the hearer to derive the
implicature for SOME. Indeed, according to Hurford’s constraint, A or B is infelicitous when B entails A (ex. # “Mary saw an
animal or a dog”). In the case of SOME or ALL (A=SOME and B=ALL), unless you derive the implicature for SOME, ALL entails
SOME. A hearer is thus forced to understand SOME as meaning SOME BUT NOT ALL in this context.
This should increase the rate of SI derived for sentences with SOME in the experiment.
We were also interested in the result for sentences containing SOME OR ALL: it enabled to see if the implicature was
derived for OR.
54
Pictures
The picture was composed of two sets of four cards,
as shown in the example on the right.
Conditions
The cards displayed corresponded to four conditions, depending on the readings they made true. As
in experiment 1, ∅ and LPS were control conditions; L and LP were target conditions.
The procedure was the same as the one we presented in experiment 1. Only the number of trials in
the experimental phase differed (72 for 2a, 132 for 2b and 2c).
Control conditions Target conditions
∅ (i) ∅ (ii) LPS L LP
SOME ♤ ♤ ♤ ♤ ♤ ♤ ? ? ♥ ♥ ♤ ♤ ♥ ♥ ♥ ♥ ♥ ♥ ? ?
ALL ♤ ♤ ♤ ♤
/ ♥ ♥ ♤ ♤ ♥ ♥ ? ? ♥ ♥ ♥ ♥ - -
SOME OR
ALL ♤ ♤ ♤ ♤ ♤ ♤ ? ? ♥ ♥ ? ? ♥ ♥ ♥ ♥ ♥ ♥ ♤ ♤
NO ♥ ♥ ♤ ♤ ♤ ♤ ? ? ♤ ♤ ♤ ♤ - -
Participants
180 participants were recruited via Mechanical Turk. 178 completed the task. We removed from the
analysis 6 participants whose native language was not English, 12 that made too many errors on ∅
and LPS controls and 4 who made more too many errors on NO-controls. We thus had 156
participants (2a: 52; 2b: 55; 2c: 50; 83 males, 73 females, mean age: 37,4, from 19 to 74 year old).
Results
Figure below shows the proportion of “yes” answers for SOME depending on the experiment.
55
As we see on the graph, we hardly detect the SSI. We conducted the same tests as in experiment 1.
The table below summarizes the results.
∅ vs L L vs LP LP vs LPS
2a χ²(1)=93.77, p< 2.2e-16 *** χ²(1)=48.205, p=3.838e-12 *** χ²(1)=21.659, p=3.257e-06 ***
2b χ²(1)=54.851, p=1.3e-13 *** χ²(1)=0.477, p=0.4898 χ²(1)=4.496, p=0.03397 *
2c χ²(1)=113.63, p=2.2e-16 *** χ²(1)=5.6503, p=0.01745 * χ²(1)=3.0859, p=0.07898
As opposed to the results of experiment 1, we don’t detect the three readings in experiment 2b and
2c. In order to compare the effect of presenting ALL sentences, we tested the interaction between
experiment (2a/2b/2c) and condition. Results were not significant, as shown below.
∅ vs L L vs LP LP vs LPS
2a vs 2b χ²(1)= 0.1172 , p=0.7321 χ²(1)=0.4408, p=0.5068 χ²(1)= 0.1172 , p=0.7321
2a vs 2c χ²(1)= 0.2885, p=0.5912 χ²(1)= 0.1362, p=0.7121 χ²(1)= 0.0604 , p=0.8059
2b vs 2c χ²(1)= 0.2498, p=0.6172 χ²(1)= 0.09, p= 0.3876 χ²(1)= 0.2579, p=0.6116
Conclusion
We believe that our results can be explained by the number of card presented (4 cards, as in the
pilot, vs 8 cards in experiment 1). Other factors could be involved, for example the environment
(effect of testing other scalar items in experiment 1).
It would be highly interesting to explore further this question and to run this experiment with 8 cards
(we did not because we were more interested in the processing of the readings). Indeed, it would
enable to test another prediction of the Gricean account, namely that the alternatives are involved
during the step from LIT to PSI (which presupposes that there exists a step from LIT to PSI).
Let’s note that another modification should be made: keeping the overall number of sentences in the
experiment constant between 2a and 2b/2c. Indeed, the proportion of target and fillers is another
factor that may come into play, and in this version of the experiment, it was higher in 2a than in 2b
and 2c.
56
4. Displayed cards depending on the condition (Experiment 1a, 1b and dual-task)
Mary: “X cards are [hearts]”.
TARGET CONDITIONS CONTROL CONDITIONS
4 condi L LP ∅ LPS
KN condi KN IGNO (i) KN (ii) IGNO KN IGNO
8 condi - The sentence
is true
The sentence
is ambiguous -
The sentence
is true
The sentence
is false - -
Player Mary Peter Mary Peter Mary Peter Mary Peter Mary Peter Mary Peter Mary Peter Mary Peter
SOME ♥♥♥♥
♥♥♥♥
♥♥♥♥
????
♥♥♥♥
????
♥♥♥♥
♤♤♤♤
♥♥♥♥
????
♥♥♥♥
♥♥♥♥
♤♤♤♤
♤♤♤♤
♤♤♤♤
????
♤♤♤♤
????
♤♤♤♤
♥♥♥♥
♤♤♤♤
????
♤♤♤♤
♤♤♤♤
♥♥♥♥
♤♤♤♤
♥♥♥♥
????
♥♥♥♤
???♤
♥♥♥♤
♥♥♥♤
PLURAL ♥♤♤♤
♤♤♤♤
♥♤♤♤
????
♥♤♤♤
????
♥♤♤♤
♥♥♥♤
♥♤♤♤
????
♥♤♤♤
♤♤♤♤
♤♤♤♤
♤♤♤♤
♤♤♤♤
????
♤♤♤♤
????
♤♤♤♤
♥♥♥♥
♤♤♤♤
????
♤♤♤♤
♤♤♤♤
♥♥♥♥
♤♤♤♤
♥♥♥♥
????
♥♥♥♤
???♤
♥♥♥♤
♥♥♥♤
TWO ♥♥♥♥
♤♤♤♤
♥♥♥♥
????
♥♥♤♤
????
♥♥♤♤
♤♤♤♤
♥♥♤♤
????
♥♥♤♤
♥♥♤♤
♤♤♤♤
♤♤♤♤
♤♤♤♤
????
♥♤♤♤
????
♥♤♤♤
♥♤♤♤
♥♤♤♤
????
♥♤♤♤
♥♤♤♤
♥♥♤♤
♤♤♤♤
♥♥♤♤
???? - -
ALMOST
ALL
♥♥♥♥
♥♥♥♥
♥♥♥♥
????
♥♥♥♥
♥♥♥?
♥♥♥♥
♥♥♥♤
♥♥♥♥
♥♥♥?
♥♥♥♥
♥♥♥♥
♤♤♤♤
♤♤♤♤
♤♤♤♤
????
♤???
????
♤♥♥♥
♥♥♥♥
♤???
????
♤♤♤♤
♤♤♤♤
♥♥♥♥
♥♥♥♤
♥♥♥♥
???? - -
ALMOST
NO
♤♤♤♤
♤♤♤♤
♤♤♤♤
????
♤♤♤♤
♤♤♤?
♤♤♤♤
♤♤♤♥
♤♤♤♤
♤♤♤?
♤♤♤♤
♤♤♤♤
♥♥♥♥
♥♥♥♥
♥♥♥♥
????
♥???
????
♥♤♤♤
♤♤♤♤
♥???
????
♥♥♥♥
♥♥♥♥
♥♤♤♤
♤♤♤♤
♥♤♤♤
???? - -
ALL - - - - - - ♥♥♥♥
♤♤♤♤
♥♥♥♥
????
♥♥♥♥
????
♥♥♥♥
♥♥♥♥
♥♥♥♥
????
♥♥♥♥
♤♤♤♤
♥♥♥♥
♥♥♥♥
♥♥♥♥
???? - -
NO - - - - - - ♥♥♥♥
♤♤♤♤
♥♥♥♥
????
♤♤♤♤
????
♤♤♤♤
♤♤♤♤
♤♤♤♤
????
♤♤♤♤
♥♥♥♥
♤♤♤♤
♤♤♤♤
♤♤♤♤
???? - -
FIRST - - - - - - ♤♤♥♥
♤♤♥♥
??♥♥
??♥♥ - -
??♥♥
??♥♥
♤♤♥♥
♤♤♥♥
♥♥♤♤
♥♥♤♤
♥♥♤♤
????
♥♥♤♤
????
♥♥♤♤
♥♥♤♤
LAST - - - - - - ♥♥♤♤
♥♥♤♤
♥♥??
♥♥?? - -
♥♥??
♥♥??
♥♥♤♤
♥♥♤♤
♤♤♥♥
♤♤♥♥
????
♤♤♥♥
????
♤♤♥♥
♤♤♥♥
♤♤♥♥
KN: the speaker is knowledgeable.
IGNO: the speaker is ignorant.
(i) “Mary cannot say that” because the sentence is false and he knows it.
(ii) “Mary cannot say that” because he does not have enough information to say that.
Remark: LP always corresponds to L with some cards put face down.
57
5. Number of trials by condition (Experiments 1a, 1b and dual-task)
TARGET CONDITIONS CONTROL CONDITIONS
Examples Trials Total
Condition L LP O LPS
Speaker’s knowledge
conidtion KN IGNO IGNO KN IGNO IGNO KN IGNO
Truth value of the sentence
(card of the other player) - true false - true true/false - -
Exp
erim
ent
1a
SOME 8 4 4 8 4 4 16 -
18 288 306
PLURAL 8 4 4 8 4 4 16 -
TWO 8 4 4 8 4 4 16 -
ALMOST ALL 8 4 4 8 4 4 16 -
ALMOST NO 8 4 4 8 4 4 16 -
ALL - - - 4 4 4 12 -
NO - - - 4 4 4 12 -
Exp
erim
ent
1b
SOME 8 4 4 8 4 4 8 8
18 320 338
PLURAL 8 4 4 8 4 4 8 8
TWO 8 4 4 8 4 4 16 -
ALMOST ALL 8 4 4 8 4 4 16 -
ALMOST NO 8 4 4 8 4 4 16 -
ALL - - - 4 4 4 12 -
NO - - - 4 4 4 12 -
FIRST - - - 4 - 4 4 4
LAST - - - 4 - 4 4 4
Du
al t
ask
expe
rim
ent*
SOME 8 4 4 4 2 2 4 4
4+4 184 192
PLURAL 8 4 4 4 2 2 4 4
TWO 8 4 4 4 2 2 8 -
ALMOST ALL 8 4 4 4 2 2 8 -
ALMOST NO 8 4 4 4 2 2 8 -
ALL - - - 2 2 2 6 -
NO - - - 2 2 2 6 -
* to be divided by 2, according to the two Cognitive Load conditions
58
6. Instructions (experiment 1a and 1b)
Peter and Mary are playing a card game.
At each round, eight cards are put on the table. Some cards can been seen by both Peter and Mary,
and some other cards can be seen only by Peter (or only by Mary).
After they have looked at the cards, Peter (or Mary) makes a statement about the cards.
Your task is to indicate if Peter (or Mary) could say what he (or she) said, on the basis of his (or her)
information.
Here are some examples:
Example 1: Mary: "All of the cards are spades."
Can Mary say that?
NO, because even if in fact the sentence is true, she does not have enough information to say that.
Example 2: Peter: "All of the cards are hearts."
Can Peter say that?
NO, because this is false (and he has enough information to know that it's false).
Example 3: Mary: "All of the cards are clubs."
Can Mary say that?
YES, because she can be sure the sentence is true.
Training with feedback:
Five non ambiguous sentences were used, listed above. The order of the sentences was randomized.
- “All of the cards are [hearts].” (3)
- “The first* card is a [heart].” (3)
- “Fewer than five cards are [hearts].” (3)
- “There is the same number of [hearts] and [spades].” (3)
- “The last* card is a [heart].” (2)
7 were attributed to Mary, 7 to Peter. 5 expected answer: “yes” ; 5 expected answer: “no” because
not enough information; 3 expected answer: “no” because false.
*: In experiment 1b, as we used sentences with FIRST and LAST, we changed these examples.
59
7. Instructions (dual task)
Instructions (1/2) (Truth value judgment task)
This part of the instructions was the same than in experiment 1.
Training with feedback (Truth value judgment task)
Four non ambiguous sentences were used, listed above. The order was randomized.
- “All of the cards are hearts.”
- “The second card is a club.”
- “Fewer than five cards are hearts.”
- “There is the same number of spades and diamonds.”
Instructions (2/2) (Letter Memory Task)
That's not all:
Before each of these questions, you will be shown random letters.
Remember them: after you have seen the cards and given your answer, you will be asked to
reproduce the same sequence of letters in reverse order.
For example, you may see ABCD, then you will answer a question about a round of cards, and then
you will be asked to reproduce the sequence of letters in reverse order, here: DCBA.
Please give your answer IN CAPITAL LETTERS, without leaving any space between the letters.
It is very important that you memorize correctly these letters: stay focused!
Training with feedback (Letter Memory Task)
The four same sentences were used.
The number of letters presented in the training sequence depended on the order of the blocks for
the participant (LOW CL - HIGH CL vs HIGH CL – LOW CL).
60
8. Comparison between experiments
The figure below illustrates the proportion of the three readings across experiments, for each scalar
item.
LIT corresponds to the rate of answer “yes” in condition L.
PSI corresponds to the subtraction of the rate of answers “yes” in condition L
to the rate of answers “yes” in condition LP.
SSI corresponds to the rate of answer “no” in condition LP.
Our results have been replicated across four experiments: we detect three distinct readings for
SOME and ALMOST, and only two for NUMERALS and PLURAL. There are small variations in the rates
of derivation of the different readings across experiment, but the overall pattern is always the same.
Factors accounting for the variability are diverse: subjects, other sentences tested in the experiment,
relative proportion of fillers and targets, etc.
61
References
Baddeley, A. (1992). Working memory. Science, 255(5044), 556-559.
Bale, A., & Barner, D. (2013). Grammatical alternatives and pragmatic development. Alternatives in Semantics,‘Studies in
Pragmatics, Language and Cognition’. Palgrave Macmillan. New York, 238-66.
Barner, D. & Bachrach, A. (2010). Inference and exact numerical representation in early language development. Cognitive
Psychology, 60, 40–62.
Barner, D., Brooks, N., & Bale, A. (2011). Accessing the unsaid: The role of scalar alternatives in children's pragmatic
inference. Cognition, 188, 87-96.
Bergen, L., & Grodner, D. J. (2012). Speaker knowledge influences the comprehension of pragmatic inferences. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 38, 1450-1460.
Bott, L., & Noveck, I. A. (2004). Some utterances are underinformative: The onset and time course of scalar
inferences. Journal of memory and language,51(3), 437-457.
Bott, L., Bailey, T. M., & Grodner, D. (2012). Distinguishing speed from accuracy in scalar implicatures. Journal of Memory
and Language, 66(1), 123-142.
Breheny, R., Ferguson, H. J., & Katsos, N. (2013). Taking the epistemic step: toward a model of on-line access to
conversational implicatures. Cognition,126(3), 423-440.
Breheny, R., Katsos, N., & Williams, J. (2006). Are generalised scalar implicatures generated by default? An on-line
investigation into the role of context in generating pragmatic inferences. Cognition, 100(3), 434-463.
Chemla, E., & Singh, R. (2014). Remarks on the experimental turn in the study of scalar implicature, Part I. Language and
Linguistics Compass, 8(9), 373-386.
Chemla, E., & Spector, B. (2011). Experimental evidence for embedded scalar implicatures. Journal of semantics, ffq023.
Chierchia, G. (2004). Scalar implicatures, polarity phenomena, and the syntax/pragmatics interface. Structures and
beyond, 3, 39-103.
Chierchia, G. (2006). Broaden your views: Implicatures of domain widening and the “logicality” of language. Linguistic
inquiry, 37(4), 535-590.
Chierchia, G., Crain, S., Guasti, Maria T., Gualmini, A., & Meroni, L. (2001). The Acquisition of Disjunction: Evidence for a
Grammatical View of Scalar Implicatures. Proceedings of the 25th Boston University Conference on Language Development.
Somerville: Cascadilla Press, 157-168.
Chierchia, G., Fox, D., & Spector, B. (2008). The grammatical view of scalar implicatures and the relationship between
semantics and pragmatics.Unpublished manuscript.
Chierchia, G., Fox, D., & Spector, B. (2009). Hurford’s constraint and the theory of scalar implicatures. Presuppositions and
implicatures, 60, 47-62.
De Neys, W. (2006). Dual processing in reasoning two systems but one reasoner. psychological Science, 17(5), 428-433.
De Neys, W., & Schaeken, W. (2007). When people are more logical under cognitive load. Experimental Psychology
(formerly Zeitschrift für Experimentelle Psychologie), 54(2), 128-133.
Degen, J., & Tanenhaus, M. K. (2011). Making inferences: the case of scalar implicature processing. In Proceedings of the
33rd annual conference of the cognitive science society (pp. 3299-3304). Austin, TX: Cognitive Science Society.
62
Dieussaert, K., Verkerk, S., Gillard, E., & Schaeken, W. (2011). Some effort for some: further evidence that scalar
implicatures are effortful. The Quarterly Journal of Experimental Psychology, 64(12), 2352-2367.
Engle, R. W. (2002). Working memory capacity as executive attention. Current directions in psychological science, 11(1), 19-
23.
Form, L. (1995). Plurality, conjunction and events.
Fox, D. (2007). Free choice and the theory of scalar implicatures. In Uli Sauerland and Penka Stateva (Eds.), Presupposition
and implicature in compositional semantics. Houndmills, Basingstoke: Palgrave Macmillan.
Fox, D. (2014). Cancelling the Maxim of Quantity: Another challenge for a Gricean theory of scalar implicatures. Semantics
and Pragmatics, 7, 5-1.
Gazdar, G. (1979). Pragmatics: Implicature, presupposition, and logical form. New York: Academic Press.
Geurts, B. (2009). Scalar implicature and local pragmatics. Mind & Language,24(1), 51-79.
Grice, P. (1975). Logic and conversation. In P. Cole & J. Morgan (Eds.), Syntax and Semantics, Volume 3. New York: Academic
Press
Grodner, D. J., Klein, N. M., Carbary, K. M., & Tanenhaus, M. K. (2010). “Some,” and possibly all, scalar inferences are not
delayed: Evidence for immediate pragmatic enrichment. Cognition, 116(1), 42-55.
Hirschberg, J. L. B. (1985). A theory of scalar implicature. University of Pennsylvania.
Hitzeman, J. (1992). The selectional properties and entailments of almost. InPapers from the 28 th Regional Meeting of the
Chicago Linguistic Society (pp. 225-238).
Hochstein, L., Bale, A., Fox, D., & Barner, D. (2014). Ignorance and Inference: Do Problems with Gricean Epistemic Reasoning
Explain Children’s Difficulty with Scalar Implicature?. Journal of Semantics, ffu015.
Horn, L. (1972). On the Semantic Properties of Logical Operators in English. Ph.D. dissertation. University of California. Los
Angeles, CA.
Horn, L. R. (2011). Almost forever. Pragmatics and autolexical grammar: In honor of Jerry Sadock, 3-21.
Huang, Y. T., & Snedeker, J. (2009). Online interpretation of scalar quantifiers: Insight into the semantics–pragmatics
interface. Cognitive psychology, 58(3), 376-415.
Huang, Y. T., & Snedeker, J. (2009). Semantic meaning and pragmatic interpretation in 5-year-olds: evidence from real-time
spoken language comprehension. Developmental psychology, 45(6), 1723.
Kilbourn-Ceron, O. (2015). Embedded exhaustification: evidence from almost.
Landman, F. (1998). Plurals and maximalization (pp. 237-271). Springer Netherlands.
Levinson, S. C. (2000). Presumptive meanings: The theory of generalized conversational implicature. Cambridge, MA: MIT
press.
Marty, P., & Chemla, E. (2013). Scalar implicatures: working memory and a comparison with only. Frontiers in psychology, 4.
Marty, P., Chemla, E., & Spector, B. (2013). Interpreting numerals and scalar items under memory load. Lingua, 133, 152-
163.
63
Miyake, A., & Shah, P. (Eds.). (1999). Models of working memory: Mechanisms of active maintenance and executive control.
Cambridge University Press.
Noveck, I. (2001), When children are more logical than adults: Experimental investigations of scalar implicature. Cognition,
78, 165-188.
Noveck, I. A., & Posada, A. (2003). Characterizing the time course of an implicature: An evoked potentials study. Brain and
language, 85(2), 203-210.
Papafragou, A. & Musolino, J. (2003). Scalar implicatures: experiments at the semantics- pragmatics interface. Cognition,
86: 253-282.
Pearson, H. A., Khan, M., & Snedeker, J. (2011, March). Even more evidence for the emptiness of plurality: An experimental
investigation of plural interpretation as a species of implicature. In Proceedings of SALT (Vol. 20, pp. 489-507).
R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing,
Vienna, Austria. URL http://www.R-project.org/.
Rips, L. J. (1975). Inductive judgments about natural categories. Journal of verbal learning and verbal behavior, 14(6), 665-
681.
Russell, B. (2006). Against grammatical computation of scalar implicatures.Journal of semantics, 23(4), 361-382.
Sadock, J. (1981). Almost. Radical pragmatics, 257-271.
Sauerland, U. (2004). Scalar Implicatures in Complex Sentences. Linguistics and Philosophy, 27, 367-391
Sauerland, U. (2012). The computation of scalar implicatures: pragmatic, lexical or grammatical?. Language and Linguistics
Compass, 6(1), 36-49.
Sauerland, U., Anderssen, J., & Yatsushiro, K. (2005). The plural is semantically unmarked. Linguistic evidence, 413-434.
Smith, C. L. (1980). Quantifiers and question answering in young children.Journal of Experimental Child Psychology, 30(2),
191-205.
Spector, B. (2003). Scalar Implicatures: Exhaustivity and Gricean Reasoning. Proceedings of the ESSLLI, 3, 277-288.
Spector, B. (2007). Aspects of the pragmatics of plural morphology: On higher-order implicatures. Presuppositions and
implicatures in compositional semantics, 243-281.
Spector, B. (2013). Bare numerals and scalar implicatures. Language and Linguistics Compass, 7(5), 273-294.
Tieu, L., Bill, C., Romoli, J., & Crain, S. (2014, September). Plurality inferences are scalar implicatures: Evidence from
acquisition. In Semantics and Linguistic Theory (Vol. 24, pp. 122-136).
Tomlinson, J., Bott, L., & Bailey, T. (2011). Understanding literal meanings before pragmatic inference: Mouse-trajectories of
scalar implicatures. In th Biennial Conference of Experimental Pragmatics, June (pp. 2-4).
Van Rooij, R., & Schulz, K. (2004). Exhaustive interpretation of complex sentences. Journal of logic, language and
information, 13(4), 491-519.
Van Tiel, B., Van Miltenburg, E., Zevakhina, N., & Geurts, B. (2014). Scalar diversity. Journal of Semantics, ffu017.
Zweig, E. (2008). Dependent plurals and plural meaning. ProQuest.