Section A: Supplementary Analyses of Experiment...
Transcript of Section A: Supplementary Analyses of Experiment...
Supplemental Materials
“Slow Down and Remember to Remember!
A Delay Theory of Prospective Memory Costs”
by A. H. Heathcote et al., 2015, Psychological Review
http://dx.doi.org/10.1037/a0038952.supp
Section A: Supplementary Analyses of Experiment 1
Figure A1 shows the fit of the top LBA model is not markedly better than that of the
AIC-selected model shown in Figure 3. We used a mixed ANOVA to examine effects on the
top LBA model parameters including block order as a between-subjects factor. Results for all
analyses except on the threshold were consistent with the AIC-selected model analyses, with
the exception of Ter were longer non-decision time in PM (0.172s) than control (0.162s)
blocks just achieved significance, F(1,45) = 4.88, p = 0.032. The average A estimate was 0.21
and the average sv estimate for the true accumulator as 0.6, which was significantly lower that
the fixed value of one for the false accumulator, F(1,45) = 155, p < .001. However, in
contrast to the both model selection results and the analysis of the AIC-selected model
parameters, no effects on B involving PM were significant.
Delay Theory of Prospective Memory Costs
Figure A1. Top LBA model fits. Data are plotted accompanied by within-subject error bars calculated using Morey’s (2008) bias-corrected method.
Figure A2. Mean rate estimates for the top LBA model averaged over participants accompanied by within-subject error bars calculated using Morey’s (2008) bias-corrected method.
Figure A2 shows average estimates of the top LBA model v parameter. Consistent
with accurate responding the mean rate for the true (2.35) accumulator was much greater than
for the false (0.48) accumulator, F(1,45) = 290, p<.001. This difference interacted strongly
with stimulus type, F(2,45) = 63.8, = .77, p < .001, being largest for HF (2.8), least for LF
(0.7) and intermediate for NW (2), according with the ordering of accuracy by stimulus type.
There was a significant main effect of PM on mean rate, F(1,45) = 10.7, p <.01, with a
higher average rate in control (1.46) than PM (1.37) blocks. There was also a significant
interaction between block type and accumulator correspondence, F(1,45) = 16.2, p < .001 due
to a greater true vs. false accumulator difference in PM (2) than control (1.7). As shown in
Figure A2 these effects were largely due to a decrease in false accumulator rates in PM
blocks for all stimulus types.
Section B: Reanalysis of Horn et al. (2011)
In Smith’s (2003) experiment reanalysed by Horn et al. (2011) there were 126 word
and 126 nonword stimuli, with a total of 504 trials, as each stimulus was tested twice. We
analysed the same ongoing-task data as Horn et al. (2011), removing trials with responses
faster than 0.3s or slower than 3s (1.27%). We first checked for differences in accuracy and
2
Delay Theory of Prospective Memory Costs
mean RT for correct responses as a function of stimulus type and repetition (i.e., stimulus
specific practice), as well as the PM effect that was the focus of Horn et al.’s (2011) analysis.
For correct mean RT, all main effects and all two-way interactions between these three
factors were significant (ps < .001). As reported by Horn et al., the No-PM condition was
0.178s faster than the PM condition. The other factors also had strong and significant effects.
Repetition speeded responding by 0.109s, and nonwords were faster than words by 0.015s.
Stimulus type interacted strongly with repetition, as nonwords were actually 0.017s slower
than words on the first presentation but 0.047s faster than words on the second presentation.
Repetition also interacted with the PM effect, which was 0.207s for the first presentation but
reduced to 0.147s for the second presentation.
Error rates were near floor, but even so word responses (2.1%) were significantly
more accurate than nonword responses (2.7%), F(1,93)=7.79, p < .01, with no other
significant effects. There was a small but significant bias towards nonwords, t(94)=2.11, p =
0.04, with 49% word responses (relative to 49.7% word stimuli after removing fast and slow
RT trials in the same way as Horn et al., 2011), that was affected by repetition, F(1,93) =
5.97, p < .05. The bias was most prevalent for the second presentation (48.7%), and for the
first presentation (49.3%) it did not differ significantly from chance, F < 1. Error responses
were significantly slower than correct responses in the PM condition for nonwords, t(107) =
3.21, p < .01, but not in the No-PM condition, nor for words in the PM and No-PM
conditions. Given their significant effects we allowed for response bias, stimulus type and
repetition effects in our model-based reanalysis.
We also checked whether there were effects associated with practice at the task (as
opposed to stimulus-specific practice associated with repeated stimulus presentations) using a
factor that divided the first and second stimulus repetition blocks in half. For correct RT there
was a significant main effect of half, F(1,93) = 62.5, p < .001, which interacted with
3
Delay Theory of Prospective Memory Costs
repetition, F(1,93) = 17.1, p < .001, due to a decrease by 0.09s over halves for the initial
presentation and a smaller decrease of 0.04s over halves for the second presentation. There
was also a significant two-way interaction between half and stimulus type, F(1,93) = 34.6, p
< .001, and a three-way interaction also including repetition, F(1,93) = 22.6, p < .001, due to
a particularly large decrease during the first repetition for nonwords (0.193s) but similar
decreases for words for both first and second presentations (0.039s and 0.027s respectively)
and for nonwords on second presentation (0.33s). One method of addressing this potentially
problematic practice effect might be to analyse only second repetition data where the block-
half effect was weaker. Although the simple effect of half for the second repetition remained
significant for both correct mean RT, F(1,93) = 15.1, p < .001, and errors, F(1,93) = 10.2, p <
.01, at least half did not participate in any significant interactions. Hence, we decided to also
examine the fit models to the second repetition data as a further check.
Model Analysis. In our analysis of the full data set the top RD model allowed v and sv
to vary with stimulus type (S: word vs. nonword) and repetition (r, 1st vs. 2nd) and z and sz to
vary with repetition. Because fits were to individual participants and the block type (PM)
factor was between-subjects it was not part of the model specification. The top LBA model
allowed B to vary with accumulator (R, word vs. nonword) and repetition (r) and v to vary
with these factors and accumulator correspondence (C, true vs. false), and sv to vary with C
for the corresponding (true) accumulator. For the fits to 2nd repetition the same top models
were specified with the repetition effects removed. Tables B1 and B2 show model selection
results. For the full data set the top RD provided a better fit than the top LBA model and the
best AIC-selected model, but the LBA had the best BIC-selected model. For fits to only the
2nd repeat data the LBA model won by all criteria.
The fast-dm method of Voss and Voss (2007) used by Horn et al. (2011) and Boywitt
and Rummel (2012) to fit their data, like the method of maximum likelihood we used to
4
Delay Theory of Prospective Memory Costs
check our LBA fits, does not involve reducing the data to a set of relatively coarse percentiles
(i.e., 10th, 30th, 50th, 70th and 90th), as is standard practice for the RD model (Ratcliff &
Tuerlinckx, 2002). Ravenzwaaij and Oberauer’s (2009) finding that fast-dm does not recover
parameters as well as QMPE is likely due to it using the Kolmogorov-Smirnov (KS) method,
which minimizes the maximum deviation between data and model. By focusing only on the
maximum deviations, the KS method does not use information available in the data, but the
same is also true for QMPE with a coarse set of quantiles (Heathcote et al., 2002). In order to
check whether our results differed because we used the standard set of five quantiles we also
performed QMPE fits at a much finer grain using the 19 semi-deciles (i.e., the 5th, 10th, … 95th
percentiles). Our results were essentially equivalent with both methods for all data sets
considered in this paper.
Table B1. RD models for Smith (2003).
Table B2. LBA models for Smith (2003).
Data Model B v sv p D AIC BICTop Model R, r S, r, C C(true) 15 9599 12449 22815
All AIC Model R, r S, C C(true) 11 10354 12444 20046BIC Model R, r C C(true) 9 10959 12669 18889
2nd
Repeat
Top Model R S, C C(true) 9 4583 6293 11936AIC Model R S, C C(true) 9 4583 6293 11936BIC Model R C C(true) 7 5039 6369 10758
We report the semi-decile based results for the two re-analyses, as can be seen for the
Horn et al. (2011) re-analysis in the correct RT panels in Figure B1, which display a subset of
the semi-decile results (only a subset are shown for graphical clarity). Figure B1 plots the fit
5
Data Model v sv z sz p D AIC BICTop Model S, r S, r r r 15 9587 12437 22804
All AIC Model S S - r 14 9764 12424 22100BIC Model r r - r 10 11088 12908 19819
2nd
Repeat
Top Model S S - - 9 5006 6716 12359AIC Model S S - - 9 5006 6716 12359BIC Model r r - - 7 5514 6844 11233
Delay Theory of Prospective Memory Costs
of both top models to the 2nd repetition data and Figure B2 plots fits to the full data set. Both
the LBA and RD models capture all of the fine-grained trends in correct RT quite accurately,
but the LBA does a better job in capturing the error-related effects. Note that as errors are
relatively rare in Smith’s (2003) data (less than 3% in all conditions) this misfit only has a
small impact on overall goodness-of-fit as measured by the deviance.
Figure B1. Top RD model fits (left two columns) and top LBA model fits (right two columns) to the 2nd repetition portion of Smith’s (2003) data.
The RD fits to the 2nd repetition data confirmed, but also extended, Horn et al.’s
(2011) conclusions. The threshold (a) was higher for PM (0.25) than No-PM (0.21)
participants, F(1,93) = 5.04, p < .05. Similarly, the mean rate (v) was significantly lower for
PM (0.32) than No-PM (0.38) participants, F(1,93) = 6.24, p < .05. However, we also found
that response bias (z/a) varied between PM (0.48) and No-PM (0.54) participants, F(1,93) =
6.35, p < .05. The slowing for words is clearly evident in the PM condition correct RT panels
6
Delay Theory of Prospective Memory Costs
of Figure B1, and is not due to mean rate, which was almost identical for word (0.35) and
nonword (0.34) stimuli, F < 1. All other effects also failed to achieve significance.
Figure B2. Top RD model fits (left two columns) and top LBA model fits (right two columns) to Smith’s (2003) data.
It is instructive to examine the results of fits to the full data set to investigate the
consequences of aggregating over the strong practice effects present over the course of the 1st
repetition. A rather different picture emerged. Although mean rate was still less for PM (0.42)
than control (0.52) participants, F(1,93) = 7.45, p = < .01, and response bias was less for PM
(0.47) than control (0.54) participants, F(1,93) = 8.4, p < .01, PM had no effect on the
threshold, F<1. Significant non-decision time effects also emerged, including both a longer
mean (Ter) for PM (0.49s) than control (0.44s), F(1,93) = 8.1, p < .01, and more variability (st
) for PM (0.24s) than No-PM (0.16s), F(1,93) = 6.48, p < .05. Strong effects of repetition
were also found on start-point variability (sz), F(1,93) = 15.9, p < .001, mean rate, F(1,93) =
40, p < .001, and rate variability (sv), F(1,93) = 33.6, p < .001, as well as significant effects
of stimulus type on mean rate, F(1,93) = 5.4, p < .05, and rate variability, F(1,93) = 4.07, p
< .05.
7
Delay Theory of Prospective Memory Costs
Further caution is warranted for the LBA parameter analysis. For the 2nd presentation
data no effects on LBA parameters were significant. There was a trend for a higher threshold
in PM (0.97) than No-PM (0.71) participants, but it did not approach significance, F(1,93) =
1.69, p = 0.2. There was also a trend for the mean rate to be higher for PM (0.46) than No-
PM (0.31), rather than lower as might be expected, but again the difference did not approach
significance, F < 1. The likely reason for the lack of significant effects, given the LBA model
provides a very good fit to the 2nd presentation data, relates to a combination of factors that
can lead to unconstrained (and so highly variable) individual participant parameter estimates,
including the smaller sample size attendant to examining only the 2nd repetition data and a
low error rate. Low error rates are particularly problematic for LBA models that allow
separate mean rates for true and false accumulators, as the rate for the false accumulator is
largely determined by error RT data. The same issue does not apply to the RD model because
no parameter is mainly dependent on error RTs.
Consistent with a small sample size being problematic for obtaining precise individual
participant estimates, many significant and close to significant effects emerged in the LBA
parameters for the top model fit to the full data set. In mean rates there were significant
interactions of PM with stimulus type, F(1,93) = 7.31, p < .01 and with stimulus type and
accumulator correspondence (C), F(1,93) = 5.76, p < .05. There was also a marginal
interaction of PM with repetition in response caution (B), F(1,93) = 3.38, p = 0.069 and in sv
there was a PM main effect, F(1,93) = 3.45, p = 0.067, and interaction between PM and C,
F(1,93) = 3.45, p = .067. We do not provide any further detail of these likely spurious effects.
8
Delay Theory of Prospective Memory Costs
Section C: Reanalysis of Boywitt and Rummel (2012)
For their first experiment Boywitt and Rummel (2012) found that only the RD
threshold (a) parameter differed significantly between high and low expectancy conditions,
with a lower value in the latter condition. They interpreted this finding as validating a
prediction made by the RD model that expectancy manipulations selectively influence the a
parameter (Voss et al., 2004). In the second experiment a and non-decision time (Ter) were
significantly greater and mean rate (v) significantly less in the demanding group compared to
the other two groups, which did not differ. Boywitt and Rummel interpreted the effects on a
an v similarly to Horn et al. (2011), and the Ter effect as indicative of slowed encoding of
color due to an increased engagement in capacity-consuming strategic monitoring for the PM
cue.
An initial inspection indicated that the re-analysis of Boywitt and Rummel’s (2012)
data would be difficult because of a combination of a relatively small sample size of less than
170 trials per participant and substantial and extended practice effects. To quantify the latter
we divided trials into quarters. In experiment one correct mean RT decreased over the
quarters (1.46s, 1.4s, 1.34s, 1.22s), F(3,174) = 3.76, = .69, p < .05, with the decrease
accelerating between the final two quarters and remaining significant, F(1,58) = 4.84, p < .05.
Given the later results indicates that our strategy of using the last half of the data will not be
effective, and the manipulation in experiment one does not include a control group to test PM
cost, our further analyses focus on experiment 2.
Similar practice effects on mean RT over quarters occurred in experiment two
(1.241s, 1.18s, 1.115s, 1.061s), F(3,246) = 15.6, = .92, p < .001. The practice (quarters)
effect interacted with the PM manipulation, F(6,246) = 6.98, = .95, p < .001. However, at
least in the latter half the practice effect was no longer significant, F<1, and the interaction
with PM was only marginal, F(2,82) = 2.5, p = .09. Error rates were marginally affected by
9
Delay Theory of Prospective Memory Costs
quarters (10.7%, 10.2%, 10.8%, 11.7%), F(3,246) = 2.61, = .95, p = .052, and there was a
marginal interaction with PM, F(6,246) = 1.83, = .95, p = .098, but both dropped out, F<1,
over the last half.
Although the analysis of the last half of the data presents problems in terms of
obtaining precise parameter estimates we decided to proceed with a model analysis of the last
half data set as well as the full data set. Before doing so we note that although stimulus type
did not have a significant effect on mean correct RT, F < 1, it did have a very marked effect
on accuracy, with matching responses (16.2%) much less accurate than mismatching
responses (5.5%), F(1,82) = 20.96, p < .001, for the full data set and for the last half (17.3%
vs. 5.2%), F(1,82) = 20.94, p < .001. There was also a strong response bias towards
mismatching responses, t(84) = 6.99, p < .001, with the overall probability of a matching
response only 0.447. Hence, we allowed both stimulus type (S) and response bias in our
model analysis.
Model Analysis. As shown in Tables C1 and C2 we specified the same top models
(for both the full data set and the last half) for Boywitt and Rummel’s (2012) experiment two
as we did for the 2nd repeat analyses of Horn et al.’s (2011) data, expect that the stimulus type
factor (S) refers to matching vs. mismatching stimuli. The LBA model was preferred in all
cases in terms of goodness-of-fit and both AIC and BIC model selection. However Figure C1
shows that both models did an equally good job of capturing the qualitative trends in the full
data set. As shown in Figure C2 the same is true for the 2nd half only. In all cases the top
model was selected by AIC, and the BIC model fit significantly worse (RD, all, 2(170) =
782, p < .001, 2nd half, 2(170) = 545, p < .001, LBA, all 2(255) = 782, p < .001, and 2nd half,
2(340) = 1272, p < .001). Hence we focus on the top model in our parameter analyses.
Unfortunately using only the second half of the data reduced power to such a degree
that few effects on parameters were significant. In the RD analysis there was a higher value
10
Delay Theory of Prospective Memory Costs
of sv for match (0.12) than mismatch (0.03), F(1,82) = 7.97, p < .01. In the LBA analysis
there was a main effect of stimulus type, with a higher value for match (2.11) than mismatch
(1.46), F(1,82) = 4.69, p < .05, which interacted with accumulator correspondence, due to a
smaller difference between true and false for match (12.11) than mismatch (4.25), F(1,82) =
11.3, p < .001. Both effects correspond to the largest effect observed in this data, greater
mismatch than match accuracy. Hence, we focus on effects in fits to the full data set,
commenting on corresponding trends in the 2nd half analysis where appropriate.
Table C1. RD models for Boywitt and Rummel (2012) experiment two.
Table C2. LBA models for Boywitt and Rummel (2012) experiment two.
Data Model B V sv p D AIC BICTop Model R S, C C(true) 9 3879 5409 10493
All AIC Model R S, C C(true) 9 3879 5409 10493BIC Model R C C(true) 6 4843 5863 9252
2nd
Half
Top Model R S, C C(true) 9 2887 4417 9412AIC Model R S, C C(true) 9 2887 4417 9412BIC Model R C C(true) 5 4159 5509 7784
Figure C1. Top RD model fits (left two columns) and top LBA model fits (right two columns) to the Boywitt and Rummel’s (2012) Experiment 2 full data set.
11
Data Model v sv p D AIC BICTop Model S S 9 4217 5747 10831
All AIC Model S S 9 4217 5747 10831BIC Model - - 7 4999 6189 10143
2nd
Half
Top Model S S 9 3126 4656 9650AIC Model S S 9 3126 4656 9650BIC Model - - 7 3671 4861 8746
Delay Theory of Prospective Memory Costs
Figure C2. Top RD model fits (left two columns) and top LBA model fits (right two columns) to the second half of Boywitt and Rummel’s (2012) Experiment 2 data set.
In the RD analysis the threshold was higher for the demanding condition (0.23) than
the remaining PM conditions (0.19), F(2,82) = 6.1, p < .01, whereas in the 2nd half analysis
demanding (0.21) and non-demanding (0.22) thresholds were similar. Ter was longer by 0.1s
in the demanding PM condition than the other conditions, F(2,82) = 3.65, p < .05, and the
same was true in the 2nd half analysis, by 0.07s. Similarly, st was greater by 0.15s in the
demanding PM condition than the other conditions, F(2,82) = 3.17, p < .05, but no
appreciable trend was apparent in the 2nd half analysis. There was a significant interaction
between stimulus type and PM in mean rate, F(2,82) = 4.0, p < .05, with the average of the
PM conditions 0.55 less than the control condition for match, whereas only the demanding
PM condition had a lower rate, by 0.35 on average, for mismatch. For the 2nd half analysis no
interaction was evident, but there was a trend for demanding (0.18) to be less than non-
demanding (0.245) and control (0.22), F(2,82) = 2.76, p = 0.15. For sv the same strong effect
of stimulus type as in the 2nd half analysis was present, F(1,82) = 30.5, p < .001.
In the LBA analysis of the full data set no effects were significant except on mean
rate, where the main effect of accumulator correspondence was significant, F(1,82) = 41.6, p
< .001. The interaction of this effect with stimulus type marginal, F(1,82) = 3.76, p = .056,
following the same pattern as in the 2nd half analysis, a smaller difference for match (1.75)
than mismatch (3.49).
12
Delay Theory of Prospective Memory Costs
Section D: Top Model Fits for Lourenço et al. (2013)
Figure D1. Top RD model fits to Lourenço et al. (2013) specific and non-specific conditions. Data are plotted accompanied by within-subject error bars calculated using Morey’s (2008) bias-corrected method.
Figure D2. Top LBA model fits to Lourenço et al. (2013) specific and non-specific conditions. Data are plotted accompanied by within-subject error bars calculated using Morey’s (2008) bias-corrected method.
13