supp.apa.orgsupp.apa.org/.../supplemental/a0036793/Does_Calculatio… · Web viewSo CAL students...
Embed Size (px)
Transcript of supp.apa.orgsupp.apa.org/.../supplemental/a0036793/Does_Calculatio… · Web viewSo CAL students...

Supplemental Materials
“Does Calculation or Word-Problem Instruction Provide a Stronger Route to Prealgebraic Knowledge?”
by L. S. Fuchs et al., 2014, Journal of Educational Psychology
http://dx.doi.org/10.1037/a0036793
Study Conditions
The study conditions were business-as-usual control, 2-tiered CAL intervention, and 2-
tiered WP intervention. Tier 1 was 34 whole-class intervention lessons (2 lessons per week for
17 weeks; 40-45 min per lesson) for all students in the class. Researcher-delivered whole-class
instruction substituted for ~185 of ~300 min of classroom teachers’ weekly business-as-usual
mathematics instruction.
Tier 2 was 39 tutoring lessons (3 times per week for 13 weeks, beginning in Weeks 4-5 of
Tier 1 instruction; 2-3 children per group; 25-30 min per lesson) provided to 272 students. The
benchmark for low performance to determine eligibility for tutoring was <7 on calculation and
word-problem screeners. This yielded 320 students who were eligible for tutoring. In Cohorts 1
and 4, we accommodated more students due to additional resources (as typically done in RTI).
So CAL students scoring <7 on calculations but >7 on word problems also were eligible, as were
WP students scoring <7 on word problems but >7 on calculations. Also, in each cohort, before
finalizing tutoring decisions, we asked teachers to confirm the appropriateness of selections
based on classroom observations. With this teacher input, 50 students who were eligible for
tutoring did not receive it, and 2 students who were not eligible did receive it. So although 320
students were eligible for tutoring according to the benchmarks we had set, 272 students received
tutoring: 84 in Cohort 1 (42 in CAL; 42 in WP); 50 in Cohort 2 (25 in CAL; 25 in WP); 72 in
Cohort 3 (34 in CAL; 38 in WP); and 66 in Cohort 4 (34 in CAL; 32 in WP).
In this section, we describe (a) the framework for CAL and WP intervention, with
linkages in each program to pre-algebraic knowledge; (b) the nature of control group instruction
and distinctions between control and the intervention conditions; (c) the structure of whole-class

CAL and WP instruction and then CAL and WP tutoring; (d) the research assistant (RA) teachers
and tutors and how they were prepared and supported; and (e) fidelity of implementation.
Program manuals, which include lesson guides, are available from the first author, under the title
Math Wise for CAL whole-class instruction and tutoring and Pirate Math for WP whole-class
instruction and tutoring (Math Wise because the theme is owls, with allusions to becoming
“wiser” in mathematics by using appropriate calculation strategies; Pirate Math because the
theme is pirates, with allusions to finding x, the unknown, in word problems, just as x marks the
treasure on pirate maps.)
Framework for CAL Intervention
CAL intervention incorporated two major emphases that reflect understanding about how
children develop competence with simple arithmetic and procedural calculations (e.g., Fuchs,
Geary, et al., 2013; Fuson & Kwon, 1992; Geary et al., 2008; Groen & Resnick, 1977; LeFevre
& Morris, 1999; Siegler & Shrager, 1984). The first emphasis was interconnected knowledge
about number (e.g., cardinality, inverse relation between addition and subtraction;
commutativity). For example, students used manipulatives to explore how a target number (e.g.,
5 in the 5 set) can be partitioned in different ways to derive adding and subtracting problems in a
set. They focused on part-whole knowledge with number families (problems in a set with the
same three numbers, e.g., 2 + 3=5, 3 + 2=5, 5 - 2=3, 5 - 3=2), grouping families and using visual
displays/blocks to show how/why four problems make a family and to explore the inverse
relation between addition and subtraction. They generated all addition and subtraction problems
(with answers) in a set, while using manipulatives to show problems. The number knowledge
emphasis in CAL also had a strong focus on tens concepts and place value. Students practiced
counting by 10s with a number list; explored relations between ones and tens and the meaning of
zeros in the ones and tens places; used and regrouped manipulatives to represent 1- and 2-digit
numbers; and identified smaller and larger numbers using place value and the number list.
The second major emphasis in CAL intervention was practice. Students were explicitly
taught and practiced efficient counting procedures for solving 1-digit problems and 2-digit plus
1-digit problems that do not require regrouping. Practice required students to generate many
correct responses to such problems to help them form long-term representations to support

retrieval. Students were also explicitly taught and practiced efficient procedures for identifying
when regrouping was required in addition and subtraction problems and for actual regrouping.
CAL intervention was divided into six units: (a) equal sign as a relational term; (b)
addition concepts and operational strategies for problems for which retrieval is a viable strategy
(problems where both operands are 1-digit or one operand is 1 digit and the other is 2 digits but
regrouping is not required); (c) concepts and operational strategies for similar problems
involving subtraction; (d) concepts and operational strategies for addition problems with
regrouping; (e) concepts and operational strategies for subtraction problems with regrouping; and
(f) review (although cumulative review was also integrated throughout the first five units).
Framework for WP Intervention
Our framework for studying word problems was based on Kintsch and colleagues
(Cummins, Kintsch, Reusser, & Weimer, 1988; Kintsch & Greeno, 1985; Nathan, Kintsch, &
Young, 1992), who pose that word-problem solving is an interaction between problem-solving
strategies and language comprehension processes. This model assumes that general features of
the text comprehension process apply across stories, informational text, and word-problem
statements, but the comprehension strategies, the nature of required knowledge structures, and
the form of resulting macrostructures and situation and problem models differ by task. According
to this model, memory representations of word problems have three components. The first
involves constructing a coherent structure of the text’s essential ideas. The second, the situation
model, requires supplementing the text with inferences based on the child’s world knowledge;
this includes informal knowledge about conceptual relations among quantities. The problem
solver coordinates this information with the third component – problem models or schema – to
formalize the conceptual relations among quantities. The schema guides application of solution
strategies. At second grade, combine, compare, and change problem types are the major schema.
The model poses that this process of building the propositional text structure, inferencing,
identifying schema, and applying solution strategies makes strong demands on three cognitive
resources: working memory, reasoning ability, and language comprehension.
In terms of working memory and reasoning ability, consider a combine problem (two
parts are combined to make a total): Joe has 3 marbles. Tom has 5 marbles. Tom also has 2

balls. How many marbles do the boys have in all? The competent problem solver processes
sentence 1 to identify object = marbles; quantity=3; actor=Joe; but Joe’s role=unknown. This is
placed in short-term memory. In sentence 2, propositions are similarly coded and held in
memory. In sentence 3, balls fails to match the object code in sentences 1 and 2, signaling that 2
balls may be irrelevant; this is added to memory. In the question, the quantitative proposition
how many marbles and the phrase in all cues the problem solver that this problem falls in the
combine schema. So the problem solver assigns the role of superset (total) to the question;
checks information held in short-term memory to assign subset roles (the two parts); and rejects
2 balls as irrelevant. Filling in these slots of the schema in this way triggers a set of problem-
solving strategies. The hope is that with typical school instruction, children will gradually
construct the combine schema on their own, just as they devise their own strategies for handling
the demands on working memory and reasoning this problem-solving sequence involves.
This schema-based approach to WP intervention explicitly teaches children the
underlying structure of combine, compare, and change schema, using real-life scenarios and role
playing with stories that have no unknowns. Gradually, the teacher (a) transitions from complete
stories involving these informal relationships between quantities to problem statements with
missing information and (b) introduces graphic representations to formalize the quantitative
relations underlying each schema and provide opportunities for students to place knowns and
unknowns into the graphic representations. The teacher quickly transitions to “meta-equations”
that represent the schema and teaches step-by-step strategies that begin with identifying problem
statements as combine, compare, or change schema and then building the propositional text
structure. Schema-based instruction facilitates connections among the situation model, schema,
and productive solution strategies by making these connections explicit. It also provides children
with strategies that reduce demands on working memory and reasoning. The child RUNs through
the problem: Reads it, Underlines the question in which the object code (marbles) is revealed,
and Names the explicitly taught combine schema. This prompts the child to write the combine
meta-equation (P1 + P2 = T for the above problem). The child then re-reads the problem
statement. While re-reading, he/she replaces P1 and P2 with quantities for each relevant “part”
and crosses out irrelevant objects/numbers. This reduces the burden on working memory and
reasoning, as it provides the equation for problem solving and sets up the solution equation.

As Kintsch and colleagues discussed and Cummins et al. (1988) showed, however, word-
problem solving also relies heavily on language comprehension processes. As per Kintsch and
Greeno (1985), children “understand important vocabulary and language constructions prior to
school entry” (p. 111) and “through instruction in arithmetic and word problems, learn to treat
these words in a special, task-specific way, including extensions to ordinary usage for terms
(e.g., all or more) to more complicated constructions involving sets (in all and more than)” (p.
111). The assumption is that “students have the necessary language abilities to understand
problem statements and … form an appropriate problem model” (p. 330, Nathan et al., 1992).
But for many children, this assumption is shaky. Cummins et al. simulated incorrect problem
solving with two types of errors: incorrect math problem-solving processes versus language
processing errors. Correct problem representation depended more on language, and changing
wording in only minor ways dramatically affected accuracy. As Nathan et al. concluded,
instruction must “focus on language processes as well as the mathematical aspects of word-
problem solving” (p. 332). Our approach to schema-based instruction differs from other forms of
schema-based instruction (e.g., Jitendra, Star, Rodriguez, Lindell, & Someki, 2011; Jitendra et
al., 2009), in part, by providing explicit instruction on the language comprehension demands
specific to combine, compare, and change problem types. The purpose is to teach the subject-
matter-specific vocabulary and language constructions critical to these problem types and help
children treat this language in special, task-specific ways. The major challenges we address are
(a) underdeveloped representations of relational terminology and constructions (e.g., more/less
than; older; stronger) for compare problems; (b) discriminating relational vocabulary and
constructions from confusable ones (e.g., Tom has 5 fewer marbles than Jill, as in compare
problems, vs. Tom had 5 marbles and then he got 2 more, as in change problems); and (c) under-
developed representations of vocabulary related to quantities (e.g., amount refers to quantity) and
taxonomic relations at superordinate levels (e.g., 2 dogs+3 cats=5 animals; McGregor et al.,
2002), which are important for combine problems.
WP intervention was divided into five units: (a) foundational skills for the word-problem
content (i.e., equal sign as a relational term; strategies to find x; strategies for checking word-
problem work); (b) combine program; (c) compare problems; (d) change problems; and (f)
review (although cumulative review was also integrated throughout the first four units). The
program typically provides explicit conceptual and strategy instruction on 1- and 2-digit

calculations (e.g., Fuchs et al., 2009), but for the present study, we removed all instruction on
calculations. When students asked questions or needed corrective feedback on calculations, they
were told to use the strategies they learned from their classroom teachers.
Linkages With Pre-Algebraic Knowledge
CAL and WP intervention incorporated instructional linkages to pre-algebraic
knowledge, as per Pillay et al. (1989). This occurred in two ways. First, both CAL and WP
intervention explicitly focused on understanding the equal sign as a relational symbol (Jacobs,
Franke, Carpenter, Levi, & Battey, 2007). Some work (Baroody & Ginsburg, 1983; Blanton &
Kaput, 2005) suggests that teachers’ consistent use of the phrase is the same as (instead of
equals) with young children is associated with improved understanding of the equal sign. Short-
term experiments with intermediate age students show that explicit instruction on the meaning or
location of the equal sign can enhance equal sign understanding and performance on open,
nonstandard equations (e.g., 6 + 4 + 7 = 6 + __; McNeil & Alibali, 2005; Rittle-Johnson &
Alibali, 1999). Powell and Fuchs (2010) showed that third graders with mathematics difficulty
who received schema-based tutoring plus equal-sign instruction performed better than students
who received schema-based tutoring alone on closed equations and some types of word
problems.
Second, as discussed, WP intervention taught children to represent the underlying
structure of schemas in terms of “meta-equations”: for combine problems, P1 + P2 = T; for
compare problems, Bigger minus Smaller = Difference (B – s = D); for change problems, Start
plus/minus Change = End (ST +/- C = E). Children were taught to identify the problem type and
write the corresponding meta-equation; re-read while replacing slots in the meta-equation with
information from the problem statement (including x for the unknown); and solve for x (x could
occur in any of the three slots of the equation). This has been shown to encourage pre-algebraic
thinking in second graders (Fuchs, Zumeta, et al., 2010). Because WP intervention provided this
additional linkage with algebraic thinking over CAL and because WPs may involve greater
symbolic complexity than calculations (as outlined in the introduction), we expected WP
intervention to stronger pre-algebraic knowledge than CAL.

Teachers’ Classroom Instruction and Distinctions Between Control and the CAL/WP
Intervention
Classroom teachers relied primarily on the basal program Houghton Mifflin Math
(Greenes et al., 2005) to guide mathematics instruction. Their curricular content aligned with the
content in CAL intervention (1- and 2-digit adding and subtracting) and WP intervention
(combine, compare, and change word problems). In this way, control students received
calculation and word-problem instruction relevant to the study. The amount of whole-class
instruction was comparable in all three conditions, but tutored children in CAL and WP
intervention received more instruction than some of the children in control group who would
have been eligible for tutoring (instruction was of similar time for control group students who
participated in the school’s intervention period in math). Results, however, indicated no
interaction between tutoring eligibility status and treatment condition.
Based on analysis of Houghton Mifflin Math (Greenes et al., 2005) and teacher reports,
key distinctions between the control and CAL conditions were as follows. (1) Control group
instruction did not address the equal sign as a relational term. (2) Control group instruction
focused less on number knowledge and more on procedures. (3) CAL provided greater emphasis
on development of fluency with efficient counting strategies. Important commonalities between
the control and CAL conditions were as follows: instruction addressed one problem type at a
time; provided students with explicit steps for deriving solutions; and relied on worked
examples, guided group practice, and independent practice with corrective feedback.
Key distinctions between the control and WP conditions were as follows. (1) Control
group instruction did not address the equal sign as a relational term. (2) Control group instruction
emphasized a metacognitive approach to solving word problems, in which students used guided
generic questions (not specific to problem types) to plan, solve, and reflect on the content of
word problems; WP did not employ this general set of metacognitive strategies. (3) In contrast to
WP, there was no attempt in the control condition to explicitly teach students to understand word
problems in terms of the combine, compare, or change schemas. (4) Control group instruction
provided strong emphasis on computational requirements for problem solution; WP intervention
provided none. (5) Although control group instruction allocated 3 weeks of instruction to finding

missing addends, it focused substantially less on this topic and did not connect this topic to the
structure of word problems. (6) Control group instruction taught children to rely on keywords
(e.g., more is a signal to add the numbers in the problem), a common approach in schools; WP
avoided keywords (because they only produce correct solutions ~50% of the time). Important
commonalities between the control and WP conditions were as follows: instruction addressed
one problem type at a time; focused on concepts underlying the problem type; provided students
with explicit steps for deriving solutions; and relied on worked examples, guided group practice,
and independent practice with corrective feedback.
Structure of Whole-Class CAL and WP Intervention
Whole-class CAL and WP instruction occurred for 17 weeks, two lessons per week, each
40-45 min. RA teachers delivered instruction using lesson guides, included in the program
manuals, to promote fidelity. Guides were studied, not read or memorized, to support an
authentic teaching style. As in Math Wise and Pirate Math, CAL and WP whole-class instruction
is explicit. Each session begins with the teacher explaining worked examples and moving to
guided practice, with constant student participation in applying demonstrated strategies in
partially worked and then unworked problems. This guided practice is conducted in large-group
format. Then children work cooperatively on problems in pairs, while the teacher circulates to
provide help as needed. The lesson ends with independent practice on the day’s topic. Corrective
feedback is systematically provided, and cumulative review is systematically incorporated.
In CAL whole-class instruction, this general framework is captured in four activities per
lesson. The first, the Daily Lesson, focuses on six to eight problems on the day’s topic. This
involves the teacher presenting worked examples and moving to partially worked and then
unworked problems, with constant student participation. The second activity is Partner Work, in
which paired students (a higher- and lower-performer) work in a structured manner on 12
problems on that day’s topic, as coach and player. As validated with Peer-Assisted Learning
Strategies (Fuchs et al., 1997), the higher-performing student begins as coach, who asks the
player step-by-step questions to model the teacher’s solution strategy and provides corrective
feedback. This occurs on three problems. Then the player “thinks-aloud” the solution strategy on
the next three problems, while the coach monitors and provides corrective feedback. Students

then switch roles, so the lower-performer asks his/her partner to model the solution strategy on
problems 7-9, while providing corrective feedback, and finally the higher performer talks aloud
the final three problems. The teacher circulates while providing assistance. The third activity is
Time Owl (beginning Lesson 15), which presents a page of different types of calculation
problems. The teacher provides a directive (e.g., “solve addition problems that require
regrouping”); then students have 1 min to find and complete that type of problem. At the end of
1 min, students switch Time Owls with partners and check answers. The last activity, Individual
Practice, provides students with a practice sheet of 25 problems on the day’s topic and
cumulative review. At the end of 5 min, students switch papers to check answers. During the
final three activities, students earn points for working with partners cooperatively and producing
accurate work. The pair with the most points collects Math Wise folders.
WP whole-class instruction also includes four activities per lesson. The first is the
Teacher-Led Problem, in which the teacher reviews and leads discussion about a problem from
the previous day’s lesson. Each student has a copy of that problem, which the teacher and
students work through together. The second activity is the Daily Lesson, focused on that day’s
topic. This involves the teacher presenting worked examples and moving to partially-worked and
then to unworked problems, with constant student participation. The third activity is Partner
Work, in which students work in pairs to solve two word problems on the day’s topic. After ~8
min, the teacher shares answers with the class, against which the pair compares its work. The
final activity is Individual Practice, in which students complete five find x problems and one
word problem. As each student finishes, the teacher checks his/her work. Students earn points for
each part of a correctly applied solution strategy and record scores on a “treasure map.”
Structure of CAL and WP Tutoring
CAL and WP tutoring provides 3 sessions per week, each for 25-30 min for 13 weeks
(beginning in Weeks 4-5 of whole-class instruction), in a quiet space outside the classroom. Each
group includes 2-3 students. The content of tutoring mirrors the content of whole-class
instruction. However, tutoring targets the most difficult concepts from classroom instruction;
incorporates manipulatives more frequently; employs additional scaffolding to support learning;
teaches additional strategies; and incorporates self-regulated learning strategies with tangible

reinforcement. Throughout each lesson, tutors continuously engage students by posing questions
to one student or the group. Tutors set a timer at random intervals (as prescribed in the lesson
guide) and award a checkmark if all students are on-task (e.g., following directions, working
hard) each time the timer beeps. (Students do not know when the timer will beep.) At the end of
lessons, tutors also award individual points for correct answers on “bonus problems,” which are
prescribed to tutors in the lesson guide. (Students do not know which problems will earn points
until the end of the lesson.) Students color an owl/pirate for each checkmark earned. When 16
owls/pirates are colored, they pick a small prize.
Each CAL tutoring lesson comprises six activities. The first is Number Combination
Flash Cards (through Lesson 13, the focus is addition; then addition and subtraction). The tutor
shows cards to students in round robin format for 1 min. When an error occurs, responding
pauses while the student uses the taught counting strategy to derive the correct answer. At the
end of 1 min, correctly answered cards are counted. Then the deck is reshuffled for another 1-
min round, in which students try to beat that score. The group graphs the higher score. The
second activity is the Number Combination Lesson. The tutor reviews number knowledge
content (e.g., cardinality, mental representation of the number line, inverse relation between
addition and subtraction, commutativity for addition) and efficient counting strategies (i.e., min
strategy for adding; missing addend strategy for subtracting). Lessons begin with +1/-1, moving
to +0/-0 and then doubles. Next, lessons focus on number combinations sets and families, using
manipulatives to build connections. The third activity is Owl Belly, for which each student’s
worksheet shows eight owls, each with a number on its belly. The tutor reads a problem (e.g., 9 +
3); students color the owl with the answer (12). Tutors present six number combinations so two
owl bellies remain uncolored. Then, students create one addition problem and one subtraction
problem for the remaining owls. For the fourth activity, Double-Digit Flash Cards, students
respond in round-robin style for 1 min by saying whether each problem requires addition or
subtraction (Lessons 1-21) or whether the problem requires regrouping and explaining why
(Lessons 22-39). In the next activity, the Double-Digit Lesson, tutors review stories and rhymes,
taught in whole-class instruction, which capture place value concepts and the number knowledge
basis for regrouping. Then students complete six related problems, using Base-10 blocks to
illustrate regrouping concepts. The final activity is Paper-and-Pencil Review. Students have 1

min to complete 10 1-digit problems and 2 min to complete five 2-digit problems. Tutors then
provide corrective feedback.
Each WP tutoring comprises four activities. The first is the Game, which differs by lesson
topic: Find X Game, Total Game (filling in equations based on orally-presented problems), or
Difference Game (underlining and drawing > or < between two compared amounts in word
problems). In the second activity, the Daily Lesson, students complete 3-4 word problems using
strategies taught in whole-class instruction or tutoring; most problems are aligned with the unit’s
topic, but cumulative review across problem types is incorporated. With the third activity, the
Sorting Game, the tutor reads word problems; students name schemas in round-robin style for 2
min; the tutor provides corrective feedback. For the final activity, Paper-Pencil Review, students
work individually on find x problems for 2 min and a word problem for 2 min. Tutors provide
corrective feedback.
Research Assistant (RA) Teachers and Tutors
Across the four cohorts, the typical RA was one to two years beyond undergraduate
education, studying for a graduate degree in education, special education, counseling, or
education policy. Ten RA-teachers and 30 RA-tutors worked 1 year; five RA-teachers and four
RA-tutors worked more than one year. One full-time project coordinator also served as an RA-
teacher, as she led the RA-teachers; another served as an RA-tutor, while she led the RA tutors.
Different sets of RAs conducted whole-class instruction versus tutoring. To control for RA
quality, all RA teachers taught classrooms in the CAL and WP conditions; all RA-tutors taught
groups in the CAL and WP conditions. We guarded against contagion, by color coding lesson
guides and teaching materials by condition; reviewing distinctions between the two conditions in
weekly meetings; and providing corrective feedback on live observations and audiotapes of
whole-class and tutoring sessions.
To prepare and support RAs, we began with an introductory meeting, in which we
discussed research ethics and procedures, including professional behavior in schools. Then RAs
were prepared for their specific role (teacher or tutor) in two full-day sessions, in which RAs
were introduced to the project and its goals, assigned roles (teacher or tutor), and provided
instruction, demonstrations, and lesson guides to study. Next, they were paired to practice

designated first-unit lessons. Then, they conducted one lesson for a project coordinator (they did
not know ahead which lesson) and were judged on a point-by-point system for fidelity. RAs who
achieved 95% fidelity were deemed reliable. Those who scored below 95% were coached on
missed points, required to practice more, and then re-rated at a later time on another lesson. This
recurred until the RA met criteria. At weekly meetings, project coordinators and the first author
met for problem-solving sessions separately with RA-teachers and with RA-tutors. At the
beginning of each unit, a 3-hour session oriented RAs. Also, all sessions were audiotaped, and a
sample of live observations was conducted for each RA. On that basis, project coordinated
provided written and oral corrective feedback and one-to-one retraining as needed. RA-teachers
and RA-tutors met periodically to share observations of students they shared between whole-
class instruction and tutoring.
Fidelity
Prior to the first whole-class and tutoring session, research staff agreed on the essential
information in each lesson and made a checklist of points for each lesson. This was done for
CAL and WP whole-class instruction and tutoring. Each session was audiotaped. At the study’s
end, RAs independently listened to a random sample of tapes while completing checklists to
identify the percentage of points addressed. We sampled 20% of whole-class instruction tapes
equitably within conditions, RA-teachers, classrooms, and lesson types; we sampled 20% of
tutoring tapes equitably within conditions, RA-tutors, tutoring groups, and lesson types. For
whole-class intervention, the mean percentage of points addressed was 95.87 (SD = 1.40) for
CAL and 94.86 (SD = 1.85) for WP, t(16) = 1.70, p = .110. For tutoring, the mean percentage of
points addressed was 96.06 (SD = 2.83) for CAL and 96.34 (SD = 3.28) for WP, t(35) = 0.58, p =
.563. (In these fidelity analyses, teacher/tutor was the unit of analysis; t tests are for dependent
samples because each teacher/tutor taught in both conditions.)
Supplemental Results
Supplemental analyses were conducted, analogous to those presented in the published
manuscript. Given the quasi-experimental nature of those results (see text), here we provide
results from the subsample that did not encounter randomization issues. This subsample was of

55 teachers and 483 students in 18 schools. The majority of the excluded sample was from the
first cohort of students.
Table S1 shows raw score and standard score means and SDs for all individual measures
that were combined into the outcomes. As with the sample as a whole, there were no significant
differences as a function of treatment condition on any pretest measure. Tables S2 to S4 show
results for the calculations, word-problems, and pre-algebraic knowledge, respectively. These
tables followed directly from those in the published manuscript, for maximum comparability.
One consistent effect in the results that follow is that standard errors, particularly for random
effects, were increased, particularly for school, which is not surprising given the smaller sample;
in addition, there were fewer effects of specific included covariates. We did run exploratory
analyses that eliminated non-significant effects in this smaller sample, and others that eliminated
the random effect of school, but in no case did these change the treatment effect presented below,
except where indicated. In general, results were highly similar to those of the full sample.
Therefore, descriptions below focus on differences.
Proximal Calculation Effects (Table S2, Left Side)
For the unconditional model, the ICCs were .13 for teacher and .03 for school. Model fit
statistics: deviance = 1330.1, parameters = 3, AIC = 1336.1, BIC = 1338.7. In the final full
model, there were significant effects for pretest, p < .001, for cohort, p < .001, and for treatment
condition, F(2, 54.1) = 4.42, p < .017. Follow up to the treatment effect indicated CAL
outperformed control (p < .006; ES = 0.47); the difference between CAL and WP was now p
< .054, with ES = 0.29; as in the original analyses, the latter two groups did not differ (p = .256;
ES = 0.17). Model fit statistics for the full/final model were: deviance = 1024.8, parameters = 2,
AIC = 1028.8, BIC = 1030.6.
Distal Calculation Effects (Table S2, Right Side)
The unconditional model showed an ICC of .07 for teacher and .02 for school (model fit:
deviance = 1239.4, parameters = 3, AIC = 1245.4, BIC = 1248.1). In the final model (fit:
deviance = 1030.1, parameters = 2, AIC = 1034.1, BIC = 1035.9), there were significant effects
for pretest (p < .001), ethnicity (p < .001), RFL status (p < .016), tutoring eligibility status (p

< .002), and treatment condition, F(2, 42.2) = 9.84, p < .001. Unlike the model of the full
sample, here sex (p < .116) was not significant, but cohort was, F(2, 49.5), p < .001; cohorts 3
and 4 outperformed cohort 2. Follow up to the treatment effect indicated CAL outperformed WP
(p < .011; ES = 0.24) and control (p < .001; ES = 0.39), but now WP outperformed control as
well (p = .042; ES = 0.15)—but did not if sex was eliminated from the model (p = .053).
Proximal Word-Problem Effects (Table S3, Left Side)
The unconditional model showed an ICC of .34 for teacher and 0 for school (model fit:
deviance = 3228.4, parameters = 2, AIC = 3232.4, BIC = 3234.2). In the final model (fit:
deviance = 2907.7, parameters = 3, AIC = 2913.7, BIC = 2916.4), there were significant effects
for pretest (p < .001), sex (p < .023), and treatment condition, F(2, 41.8) = 47.25, p < .001. As in
the full sample, follow up to the treatment effect indicated that WP outperformed CAL (p < .001;
ES = 1.21) and control (p < .001; ES = 1.16), but the latter two groups did not differ (p = .656;
ES = -0.05). Unlike the model of the full sample, here tutoring eligibility status was not
significant (p = .178), and neither was cohort, p = .250. There remained a significant interaction
of treatment and cohort, p < .003, which suggested relative differences between WP and the
other groups as a function of cohort; however, WP consistently outperformed the other two
groups in each cohort, though the difference between WP and control was not significant in
cohort 2.
Distal Word-Problem Effects (Table S3, Right Side)
The unconditional model showed an ICC of 0 for teacher and .06 for school (model fit:
deviance = 1163.7, parameters = 2, AIC = 1167.8, BIC = 1169.5). In the final model (fit:
deviance = 763.0, parameters = 3, AIC = 767.0, BIC = 768.8), there were significant effects for
pretest (p < .001), tutoring eligibility status (p < .001), ESL status (p < .002), ethnicity, (p
< .001), but not cohort, p = .901, or for treatment, p = .898 (as was the case with the full sample,
ES range, .03 to .15). Unlike the full sample model, the prior effects for sex (p = .139) and for
RFL (p = .058) were not significant.
Proximal Pre-Algebraic Knowledge Effects (Table S4, Left Side)

The unconditional model showed an ICC of .68 for teacher and 0 for school (model fit:
deviance = 842.4, parameters = 2, AIC = 846.4, BIC = 848.1). In the final model (fit: deviance =
627.1, parameters = 3, AIC = 633.1, BIC = 635.8), there was a significant effect for the word
problems pretest measures (p < .001) although no longer for the computation pretest (p = .178),
for RFL status (p = .160), ESL status (p = .140), ethnicity (p = .067). Tutoring eligibility status
and cohort were also not significant (both p > .05), as in the full sample model, and the
interaction of treatment and cohort was also no longer significant (p = .376). However, as in the
full sample model, treatment was significant, F(2, 33.8) = 77.42, p < .001. Follow up to the
treatment effect indicated WP outperformed CAL (p < .001; ES = 1.46) and control (p < .001;
ES = 1.63), but the latter two groups did not differ (p = .232; ES = 0.16).
Distal Pre-Algebraic Knowledge Effects (Table S4, Right Side)
The unconditional model showed an ICC of .02 for teacher and .01 for school (model fit:
deviance = 1265.5, parameters = 3, AIC = 1271.5, BIC = 1274.1). In the final model (fit:
deviance = 931.5, parameters = 1, AIC = 933.5, BIC = 934.4), there were significant effects for
both pretests (both p < .001), ethnicity (p < .001), ESL status (p < .030), tutoring eligibility status
(p < .010), though no longer for sex (p = .351), ESL status (p = .198), or cohort (p = .556). As in
the full sample, there was a significant treatment effect, F(2, 434) = 3.72, p < .025. WP
outperformed CAL (p < .048; ES = 0.25) and control (p < .012; ES = 0.28); the latter two groups
did not differ (p = .385; ES = 0.03).

Table S1Performance Data by Treatment Condition (n = 55 Teachers, n = 483 Students)
ConditionCAL WP Control
Pre Post Pre Post Pre PostVariable X (SD) X (SD) X (SD) X (SD) X (SD) X (SD)
Screening Sums to 12 8.43 (4.73) NA 8.31 (4.75) NA 8.89 (5.19) NA Story Problems 6.57 (3.71) NA 6.70 (3.47) NA 6.60 (3.47) NA WASI IQ 92.73 (13.47) NA 93.99 (13.39) NA 91.95 (12.20) NACalculation Outcomes Proximal Sums to 18 6.34 (3.72) 12.40 (5.35) 5.95 (3.60) 11.43 (5.06) 5.71 (3.53) 10.22 (5.48) Minuends to 12 4.53 (2.59) 8.93 (5.49) 4.62 (2.88) 8.39 (4.41) 4.29 (2.81) 7.70 (4.71) Minuends to 18 3.37 (2.72) 7.89 (4.44) 3.04 (2.80) 7.19 (4.43) 2.92 (2.38) 6.09 (4.84) 2-Digit Addition 3.57 (3.71) 15.52 (4.80) 3.10 (3.33) 12.23 (5.79) 3.64 (3.39) 11.42 (6.69) 2-Digit Subtraction 2.05 (2.50) 9.05 (5.18) 2.13 (2.49) 6.52 (4.03) 2.37 (2.71) 6.88 (4.60) Distal WRAT-Raw 19.11 (2.69) 23.73 (2.05) 19.41 (2.46) 23.24 (2.19) 19.00 (2.71) 22.71 (2.76) -Standard 93.56 (12.36) 102.73 (10.24) 94.63 (12.74) 100.59 (10.41) 92.31 (13.33) 98.41 (12.66) Key Math Addition-Raw 9.72 (2.22) 9.04 (2.33) 8.82 (2.49) -Standard 109.16 (12.80) 105.86 (13.69) 104.27 (14.31) Key Math Subtraction-Raw 6.07 (2.39) 5.45 (2.34) 5.42 (2.40) -Standard 103.25 (14.61) 100.75 (14.30) 99.73 (15.22)Word-Problem Outcomes Proximal 2nd Grade Story Problems 8.23 (5.00) 12.13 (6.18) 8.04 (5.28) 19.61 (7.63) 7.29 (3.99) 12.38 (5.88) Distal Key Math Prob Solving-Raw 2.29 (1.65) 3.63 (2.11) 2.35 (1.58) 3.83 (1.89) 2.38 (1.86) 3.58 (1.97) -Standard 103.38 (9.65) 105.03 (10.96) 104.17 (9.58) 107.03 (10.54) 103.96 (10.99) 104.58 (11.03) Iowa-Raw NA 16.50 (5.53) NA 16.47 (5.23) NA 16.45 (5.14) -Standard NA 156.78 (19.22) NA 156.66 (18.18) NA 156.54 (17.48)Pre-Algebra Proximal Find X NA 4.34 (3.06) NA 7.00 (1.46) NA 4.37 (3.09) Number Sentences NA 1.17 (1.61) NA 3.58 (2.53) NA 0.83 (1.36) Distal Dynamic Assessment NA 7.98 (4.38) NA 8.90 (4.43) NA 8.09 (4.89)

Test of Pre-Algebra NA 10.92 (5.30) NA 11.83 (4.84) NA 10.46 (5.15)

Table S2
Model Characteristics for Proximal and Distal Calculation Outcomes (n = 55 Teachers, n = 483 Students)
Proximal Distalt/ 95% Parameter t/ 95%
Parameter Estimate (S.E.) Df z/F CI Estimate (S.E.) df z/F CI
Unconditional Model Fixed: Intercept (γ000) 0.10 (0.08) 10.9 1.18 -0.08 to 0.27 0.02 (0.06) 10.2 0.29 -0.11 to 0.15 Random: Teacher (σ u 0
2 ) 0.13 (0.05) 2.40b 0.06 to 0.35 0.05 (0.03) 1.63a 0.02 to 0.29 School ¿) 0.03 (0.05) 0.65 0.01 to 103.0 0.01 (0.02) 0.46 0.00 to 146 Residual (σ e
2) 0.84 (0.06) 14.61c 0.74 to 0.97 0.72 (0.05) 14.58c 0.63 to 0.83Full/Final Model Fixed: Intercept (γ000) -0.26 (0.11) 49.7 -2.25a -0.49 to -0.03 0.13 (0.14) 96.1 0.96 -0.14 to 0.40 Calculations Pretest 0.61 (0.04) 434 14.15c 0.52 to 0.69 0.35 (0.04) 428 8.26c 0.30 to 0.44 Tutoring Eligibility -0.05 (0.09) 434 -0.49 -0.23 to 0.14 -0.29 (0.09) 427 -3.09b -0.48 to -0.11 Sex (Female) NA 0.11 (0.07) 422 1.57 -0.03 to 0.25 Lunch (Free/Reduced)
NA -0.27 (0.11) 321 -2.43a -0.48 to -0.06
Ethnicity (A Amer) NA -0.30 (0.11) 363 -2.83b -0.51 to -0.09 Ethnicity (Hispanic) NA -0.01 (0.11) 383 -0.05 -0.23 to 0.22 Ethnicity (Other) NA 0.42 (0.16) 416 2.70b 0.12 to 0.73 Treatment (CAL) 0.41 (0.14) 58.3 2.88b 0.12 to 0.69 0.46 (0.11) 48.1 4.36c 0.25 to 0.67 Treatment (WP) 0.16 (0.14) 56.5 1.15 -0.12 to 0.44 0.22 (0.10) 50.7 2.08a 0.01 to 0.43 Cohort (2) -0.07 (0.14) 61 -0.51 -0.34 to 0.20 -0.29 (0.10) 41.7 -2.83b -0.49 to -0.08 Cohort (3) 0.41 (0.12) 58.9 3.36b 0.17 to 0.66 0.13 (0.09) 51.6 1.47 -0.04 to 0.32 Random: Teacher (σ u 0
2 ) 0.09 (0.03) 2.79b 0.05 to 0.21 0.01 (0.02) 0.44 0.00 to 407 School ¿) -- -- -- -- -- -- -- -- Residual (σ e
2) 0.51 (0.03) 13.88c 0.44 to 0.59 0.55 (0.04) 13.69c 0.48 to 0.64

Note. ap < .05; bp < .01; cp < .001. CAL is calculation intervention. WP is word-problem intervention. Proximal and distal outcomes are standardized composites. Covariance terms are not shown. For the unconditional model, t values are reported. For full/final model, F-values are reported for the fixed portion of model; z-values for the random portion of model. For dfs, numerators (not included) are all 1; denominators (reported) use the Kenward-Roger approximation. Under Fixed Effects, intercept refers to the outcome variable. Pretest is grand-mean centered. The code 0 is for the treatment control group, for no free lunch, for Caucasian, for ineligible for tutoring, and for cohort 4. For meaning of symbols under Random Effects, see text – although symbols in the table stand for variance components (e.g., σ e
2 denotes the student-level residual variance).

Table S3
Model Characteristics for Proximal and Distal Word-Problem Outcomes (n = 55 Teachers, n = 483 Students)
Proximal Distalt/ 95% Paramete
rt/ 95%
Parameter Estimate (S.E.) df z/F CI Estimate (S.E.) df z/F CI
Unconditional Model Fixed: Intercept (γ000) 14.98 (0.66
)55.8 22.63c 13.65 to 16.30 -0.03 (0.06) 14 -0.48 -0.17 to 0.11
Random: Teacher (σ u 02 ) 20.00 (4.78
)4.18c 13.16 to 34.03 -- -- -- --
School ¿) -- -- -- -- 0.04 (0.03) 1.65a 0.02 to 0.23 Residual (σ e
2) 38.76 (2.66)
14.57c 34.03 to 44.54 0.65 (0.04) 15.12c 0.57 to 0.74
Full/Final Model Fixed: Intercept (γ000) 11.56 (1.02
)33 11.36c 9.49 to 13.63 0.23 (0.11) 66.9 2.07a 0.01 to 0.45
Word-Problem Pretest 5.08 (0.35)
459 14.46c 4.39 to 5.78 0.75 (0.04) 454 20.14c 0.68 to 0.83
Tutoring Eligibility -0.86 (0.64)
460 -1.35 -2.12 to 0.39 -0.23 (0.06) 458 -3.58c -0.35 to -0.10
Sex (Female) 1.06 (0.46)
444 2.28a 0.15 to 1.97 -0.07 (0.05) 453 -1.48 -0.17 to -0.02
Lunch (Free/Reduced) NA -0.14 (0.07) 451 -1.90 -0.29 to 0.00 Ethnicity (A Amer) NA -0.19 (0.08) 439 -2.49a -0.33 to -0.04 Ethnicity (Hispanic) NA 0.07 (0.09) 441 0.86 -0.10 to 0.24 Ethnicity (Other) NA 0.18 (0.11) 456 1.62 -0.04 to 0.39 Second Language NA -0.26 (0.08) 457 -3.18b -0.42 to -0.10 Treatment (CAL) 1.33 (1.34
)51.4 1.00 -1.35 to 4.01 0.02 (0.07) 441 0.29 -0.12 to 0.18
Treatment (WP) 8.31 (1.37 53.3 6.07c 5.56 to 11.05 0.03 (0.07) 455 0.46 -0.11 to 0.17

) Cohort (2) 1.47 (1.77
)37.4 0.84 -2.10 to 5.06 0.04 (0.10) 23.1 0.41 -0.17 to 0.25
Cohort (3) 0.98 (1.53)
38.2 0.64 -2.11 to 4.07 0.04 (0.10) 16.8 0.41 -0.16 to 0.24
Random: Teacher (σ u 02 ) 2.12 (1.25
)1.70a 0.87 to 10.67 -- -- -- --
School ¿) 1.21 (1.11)
1.09 0.35 to 28.90 0.02 (0.01) 1.78a 0.01 to 0.08
Residual (σ e2) 24.50 (1.70
)14.41c 21.48 to 28.20 0.26 (0.02) 14.98c 0.23 to 0.30
Note. See note for Table 2. Parameter estimates for the significant treatment by cohort interaction for Proximal Word Problems are not shown in the table, but were included in the model from which the above numbers were derived, and are described in text.

Table S4
Model Characteristics for Proximal and Distal Pre-Algebra Outcomes (n = 55 Teachers, n = 483 Students)
Proximal Distalt/ 95% Parameter t/ 95%
Parameter Estimate (S.E.) df z/F CI Estimate (S.E.) df z/F CI
Unconditional Model Fixed: Intercept (γ000) -0.03 (0.06
)56.2 -0.47 -0.15 to 0.09 -0.00 (0.05) 11.
20.09 -0.11 to 0.12
Random: Teacher (σ u 02 ) 0.18 (0.04
)4.41c 0.12 to 0.30 0.02 (0.03) 0.71 0.00 to 17.25
School ¿) -- -- -- -- 0.01 (0.02) 0.57 0.00 to 384 Residual (σ e
2) 0.27 (0.02)
14.56c 0.24 to 0.31 0.77 (0.05) 14.53c 0.68 to 0.89
Full/Final Model Fixed: Intercept (γ000) -0.22 (0.10
)57.8 -2.20a -0.43 to -0.02 0.09 (0.10) 434 0.87 -0.11 to 0.29
Word-Problem Pretest 0.32 (0.04)
419 8.14c 0.24 to 0.40 0.55 (0.06) 434 9.90c 0.44 to 0.66
Calculations Pretest 0.04 (0.03)
427 1.35 -0.02 to 0.10 0.21 (0.04) 434 4.99c 0.13 to 0.29
Tutoring Eligibility -0.07 (0.06)
428 -1.10 -0.19 to 0.05 -0.22 (0.09) 434 -2.58a -0.39 to -0.05
Sex (Female) NA -0.06 (0.06) 434 -0.93 -0.18 to -0.07 Lunch (Free/Reduced) -0.10 (0.07
)370 -1.41 -0.24 to 0.04 NA
Second Language -0.11 (0.08)
374 -1.48 -0.26 to 0.04 -0.13 (0.10) 434 -1.29 -0.33 to 0.07
Ethnicity (A Amer) -0.08 (0.07)
393 -1.19 -0.22 to 0.05 -0.31 (0.09) 434 -3.42c -0.49 to -0.13
Ethnicity (Hispanic) 0.09 (0.08)
382 1.08 -0.07 to 0.24 -0.00 (0.10) 434 -0.00 -0.20 to 0.20

Ethnicity (Other) 0.08 (0.11)
410 0.76 -0.12 to 0.28 0.22 (0.13) 434 1.61 -0.05 to 0.48
Treatment (CAL) 0.15 (0.11)
43.3 1.36 -0.07 to 0.38 0.08 (0.09) 434 0.87 -0.10 to 0.26
Treatment (WP) 0.77 (0.12)
46.4 6.66c 0.54 to 1.00 0.23 (0.09) 434 2.52a 0.05 to 0.40
Cohort (2) -0.04 (0.14)
32.7 -0.26 -0.32 to 0.25 0.01 (0.09) 434 0.13 -0.16 to 0.18
Cohort (3) 0.02 (0.12)
38.2 0.15 -0.23 to 0.27 0.08 (0.08) 434 1.01 -0.07 to 0.23
Random: Teacher (σ u 02 ) 0.01 (0.01
)0.91 0.00 to 0.71 -- -- -- --
School ¿) 0.00 (0.01)
0.70 0.00 to 5.72 -- -- -- --
Residual (σ e2) 0.21 (0.02
)13.67c 0.18 to 0.24 0.44 (0.03) 14.73c 0.38 to 0.50
Note. See note for Table 2. Parameter estimates for the non-significant treatment by cohort interaction for Proximal Pre-Algebra are not shown in the table, but were included in the model from which the above numbers were derived, and are described in text.