Download - supp.apa.orgsupp.apa.org/.../supplemental/a0036793/Does_Calculatio… · Web viewSo CAL students scoring 7 on word problems also ... similar problems

Supplemental Materials

“Does Calculation or Word-Problem Instruction Provide a Stronger Route to Prealgebraic Knowledge?”

by L. S. Fuchs et al., 2014, Journal of Educational Psychology

http://dx.doi.org/10.1037/a0036793

Study Conditions

The study conditions were business-as-usual control, 2-tiered CAL intervention, and 2-

tiered WP intervention. Tier 1 was 34 whole-class intervention lessons (2 lessons per week for

17 weeks; 40-45 min per lesson) for all students in the class. Researcher-delivered whole-class

instruction substituted for ~185 of ~300 min of classroom teachers’ weekly business-as-usual

mathematics instruction.

Tier 2 was 39 tutoring lessons (3 times per week for 13 weeks, beginning in Weeks 4-5 of

Tier 1 instruction; 2-3 children per group; 25-30 min per lesson) provided to 272 students. The

benchmark for low performance to determine eligibility for tutoring was <7 on calculation and

word-problem screeners. This yielded 320 students who were eligible for tutoring. In Cohorts 1

and 4, we accommodated more students due to additional resources (as typically done in RTI).

So CAL students scoring <7 on calculations but >7 on word problems also were eligible, as were

WP students scoring <7 on word problems but >7 on calculations. Also, in each cohort, before

finalizing tutoring decisions, we asked teachers to confirm the appropriateness of selections

based on classroom observations. With this teacher input, 50 students who were eligible for

tutoring did not receive it, and 2 students who were not eligible did receive it. So although 320

students were eligible for tutoring according to the benchmarks we had set, 272 students received

tutoring: 84 in Cohort 1 (42 in CAL; 42 in WP); 50 in Cohort 2 (25 in CAL; 25 in WP); 72 in

Cohort 3 (34 in CAL; 38 in WP); and 66 in Cohort 4 (34 in CAL; 32 in WP).

In this section, we describe (a) the framework for CAL and WP intervention, with

linkages in each program to pre-algebraic knowledge; (b) the nature of control group instruction

and distinctions between control and the intervention conditions; (c) the structure of whole-class

CAL and WP instruction and then CAL and WP tutoring; (d) the research assistant (RA) teachers

and tutors and how they were prepared and supported; and (e) fidelity of implementation.

Program manuals, which include lesson guides, are available from the first author, under the title

Math Wise for CAL whole-class instruction and tutoring and Pirate Math for WP whole-class

instruction and tutoring (Math Wise because the theme is owls, with allusions to becoming

“wiser” in mathematics by using appropriate calculation strategies; Pirate Math because the

theme is pirates, with allusions to finding x, the unknown, in word problems, just as x marks the

treasure on pirate maps.)

Framework for CAL Intervention

CAL intervention incorporated two major emphases that reflect understanding about how

children develop competence with simple arithmetic and procedural calculations (e.g., Fuchs,

Geary, et al., 2013; Fuson & Kwon, 1992; Geary et al., 2008; Groen & Resnick, 1977; LeFevre

& Morris, 1999; Siegler & Shrager, 1984). The first emphasis was interconnected knowledge

about number (e.g., cardinality, inverse relation between addition and subtraction;

commutativity). For example, students used manipulatives to explore how a target number (e.g.,

5 in the 5 set) can be partitioned in different ways to derive adding and subtracting problems in a

set. They focused on part-whole knowledge with number families (problems in a set with the

same three numbers, e.g., 2 + 3=5, 3 + 2=5, 5 - 2=3, 5 - 3=2), grouping families and using visual

displays/blocks to show how/why four problems make a family and to explore the inverse

relation between addition and subtraction. They generated all addition and subtraction problems

(with answers) in a set, while using manipulatives to show problems. The number knowledge

emphasis in CAL also had a strong focus on tens concepts and place value. Students practiced

counting by 10s with a number list; explored relations between ones and tens and the meaning of

zeros in the ones and tens places; used and regrouped manipulatives to represent 1- and 2-digit

numbers; and identified smaller and larger numbers using place value and the number list.

The second major emphasis in CAL intervention was practice. Students were explicitly

taught and practiced efficient counting procedures for solving 1-digit problems and 2-digit plus

1-digit problems that do not require regrouping. Practice required students to generate many

correct responses to such problems to help them form long-term representations to support

retrieval. Students were also explicitly taught and practiced efficient procedures for identifying

when regrouping was required in addition and subtraction problems and for actual regrouping.

CAL intervention was divided into six units: (a) equal sign as a relational term; (b)

addition concepts and operational strategies for problems for which retrieval is a viable strategy

(problems where both operands are 1-digit or one operand is 1 digit and the other is 2 digits but

regrouping is not required); (c) concepts and operational strategies for similar problems

involving subtraction; (d) concepts and operational strategies for addition problems with

regrouping; (e) concepts and operational strategies for subtraction problems with regrouping; and

(f) review (although cumulative review was also integrated throughout the first five units).

Framework for WP Intervention

Our framework for studying word problems was based on Kintsch and colleagues

(Cummins, Kintsch, Reusser, & Weimer, 1988; Kintsch & Greeno, 1985; Nathan, Kintsch, &

Young, 1992), who pose that word-problem solving is an interaction between problem-solving

strategies and language comprehension processes. This model assumes that general features of

the text comprehension process apply across stories, informational text, and word-problem

statements, but the comprehension strategies, the nature of required knowledge structures, and

the form of resulting macrostructures and situation and problem models differ by task. According

to this model, memory representations of word problems have three components. The first

involves constructing a coherent structure of the text’s essential ideas. The second, the situation

model, requires supplementing the text with inferences based on the child’s world knowledge;

this includes informal knowledge about conceptual relations among quantities. The problem

solver coordinates this information with the third component – problem models or schema – to

formalize the conceptual relations among quantities. The schema guides application of solution

strategies. At second grade, combine, compare, and change problem types are the major schema.

The model poses that this process of building the propositional text structure, inferencing,

identifying schema, and applying solution strategies makes strong demands on three cognitive

resources: working memory, reasoning ability, and language comprehension.

In terms of working memory and reasoning ability, consider a combine problem (two

parts are combined to make a total): Joe has 3 marbles. Tom has 5 marbles. Tom also has 2

balls. How many marbles do the boys have in all? The competent problem solver processes

sentence 1 to identify object = marbles; quantity=3; actor=Joe; but Joe’s role=unknown. This is

placed in short-term memory. In sentence 2, propositions are similarly coded and held in

memory. In sentence 3, balls fails to match the object code in sentences 1 and 2, signaling that 2

balls may be irrelevant; this is added to memory. In the question, the quantitative proposition

how many marbles and the phrase in all cues the problem solver that this problem falls in the

combine schema. So the problem solver assigns the role of superset (total) to the question;

checks information held in short-term memory to assign subset roles (the two parts); and rejects

2 balls as irrelevant. Filling in these slots of the schema in this way triggers a set of problem-

solving strategies. The hope is that with typical school instruction, children will gradually

construct the combine schema on their own, just as they devise their own strategies for handling

the demands on working memory and reasoning this problem-solving sequence involves.

This schema-based approach to WP intervention explicitly teaches children the

underlying structure of combine, compare, and change schema, using real-life scenarios and role

playing with stories that have no unknowns. Gradually, the teacher (a) transitions from complete

stories involving these informal relationships between quantities to problem statements with

missing information and (b) introduces graphic representations to formalize the quantitative

relations underlying each schema and provide opportunities for students to place knowns and

unknowns into the graphic representations. The teacher quickly transitions to “meta-equations”

that represent the schema and teaches step-by-step strategies that begin with identifying problem

statements as combine, compare, or change schema and then building the propositional text

structure. Schema-based instruction facilitates connections among the situation model, schema,

and productive solution strategies by making these connections explicit. It also provides children

with strategies that reduce demands on working memory and reasoning. The child RUNs through

the problem: Reads it, Underlines the question in which the object code (marbles) is revealed,

and Names the explicitly taught combine schema. This prompts the child to write the combine

meta-equation (P1 + P2 = T for the above problem). The child then re-reads the problem

statement. While re-reading, he/she replaces P1 and P2 with quantities for each relevant “part”

and crosses out irrelevant objects/numbers. This reduces the burden on working memory and

reasoning, as it provides the equation for problem solving and sets up the solution equation.

As Kintsch and colleagues discussed and Cummins et al. (1988) showed, however, word-

problem solving also relies heavily on language comprehension processes. As per Kintsch and

Greeno (1985), children “understand important vocabulary and language constructions prior to

school entry” (p. 111) and “through instruction in arithmetic and word problems, learn to treat

these words in a special, task-specific way, including extensions to ordinary usage for terms

(e.g., all or more) to more complicated constructions involving sets (in all and more than)” (p.

111). The assumption is that “students have the necessary language abilities to understand

problem statements and … form an appropriate problem model” (p. 330, Nathan et al., 1992).

But for many children, this assumption is shaky. Cummins et al. simulated incorrect problem

solving with two types of errors: incorrect math problem-solving processes versus language

processing errors. Correct problem representation depended more on language, and changing

wording in only minor ways dramatically affected accuracy. As Nathan et al. concluded,

instruction must “focus on language processes as well as the mathematical aspects of word-

problem solving” (p. 332). Our approach to schema-based instruction differs from other forms of

schema-based instruction (e.g., Jitendra, Star, Rodriguez, Lindell, & Someki, 2011; Jitendra et

al., 2009), in part, by providing explicit instruction on the language comprehension demands

specific to combine, compare, and change problem types. The purpose is to teach the subject-

matter-specific vocabulary and language constructions critical to these problem types and help

children treat this language in special, task-specific ways. The major challenges we address are

(a) underdeveloped representations of relational terminology and constructions (e.g., more/less

than; older; stronger) for compare problems; (b) discriminating relational vocabulary and

constructions from confusable ones (e.g., Tom has 5 fewer marbles than Jill, as in compare

problems, vs. Tom had 5 marbles and then he got 2 more, as in change problems); and (c) under-

developed representations of vocabulary related to quantities (e.g., amount refers to quantity) and

taxonomic relations at superordinate levels (e.g., 2 dogs+3 cats=5 animals; McGregor et al.,

2002), which are important for combine problems.

WP intervention was divided into five units: (a) foundational skills for the word-problem

content (i.e., equal sign as a relational term; strategies to find x; strategies for checking word-

problem work); (b) combine program; (c) compare problems; (d) change problems; and (f)

review (although cumulative review was also integrated throughout the first four units). The

program typically provides explicit conceptual and strategy instruction on 1- and 2-digit

calculations (e.g., Fuchs et al., 2009), but for the present study, we removed all instruction on

calculations. When students asked questions or needed corrective feedback on calculations, they

were told to use the strategies they learned from their classroom teachers.

Linkages With Pre-Algebraic Knowledge

CAL and WP intervention incorporated instructional linkages to pre-algebraic

knowledge, as per Pillay et al. (1989). This occurred in two ways. First, both CAL and WP

intervention explicitly focused on understanding the equal sign as a relational symbol (Jacobs,

Franke, Carpenter, Levi, & Battey, 2007). Some work (Baroody & Ginsburg, 1983; Blanton &

Kaput, 2005) suggests that teachers’ consistent use of the phrase is the same as (instead of

equals) with young children is associated with improved understanding of the equal sign. Short-

term experiments with intermediate age students show that explicit instruction on the meaning or

location of the equal sign can enhance equal sign understanding and performance on open,

nonstandard equations (e.g., 6 + 4 + 7 = 6 + __; McNeil & Alibali, 2005; Rittle-Johnson &

Alibali, 1999). Powell and Fuchs (2010) showed that third graders with mathematics difficulty

who received schema-based tutoring plus equal-sign instruction performed better than students

who received schema-based tutoring alone on closed equations and some types of word

problems.

Second, as discussed, WP intervention taught children to represent the underlying

structure of schemas in terms of “meta-equations”: for combine problems, P1 + P2 = T; for

compare problems, Bigger minus Smaller = Difference (B – s = D); for change problems, Start

plus/minus Change = End (ST +/- C = E). Children were taught to identify the problem type and

write the corresponding meta-equation; re-read while replacing slots in the meta-equation with

information from the problem statement (including x for the unknown); and solve for x (x could

occur in any of the three slots of the equation). This has been shown to encourage pre-algebraic

thinking in second graders (Fuchs, Zumeta, et al., 2010). Because WP intervention provided this

additional linkage with algebraic thinking over CAL and because WPs may involve greater

symbolic complexity than calculations (as outlined in the introduction), we expected WP

intervention to stronger pre-algebraic knowledge than CAL.

Teachers’ Classroom Instruction and Distinctions Between Control and the CAL/WP

Intervention

Classroom teachers relied primarily on the basal program Houghton Mifflin Math

(Greenes et al., 2005) to guide mathematics instruction. Their curricular content aligned with the

content in CAL intervention (1- and 2-digit adding and subtracting) and WP intervention

(combine, compare, and change word problems). In this way, control students received

calculation and word-problem instruction relevant to the study. The amount of whole-class

instruction was comparable in all three conditions, but tutored children in CAL and WP

intervention received more instruction than some of the children in control group who would

have been eligible for tutoring (instruction was of similar time for control group students who

participated in the school’s intervention period in math). Results, however, indicated no

interaction between tutoring eligibility status and treatment condition.

Based on analysis of Houghton Mifflin Math (Greenes et al., 2005) and teacher reports,

key distinctions between the control and CAL conditions were as follows. (1) Control group

instruction did not address the equal sign as a relational term. (2) Control group instruction

focused less on number knowledge and more on procedures. (3) CAL provided greater emphasis

on development of fluency with efficient counting strategies. Important commonalities between

the control and CAL conditions were as follows: instruction addressed one problem type at a

time; provided students with explicit steps for deriving solutions; and relied on worked

examples, guided group practice, and independent practice with corrective feedback.

Key distinctions between the control and WP conditions were as follows. (1) Control

group instruction did not address the equal sign as a relational term. (2) Control group instruction

emphasized a metacognitive approach to solving word problems, in which students used guided

generic questions (not specific to problem types) to plan, solve, and reflect on the content of

word problems; WP did not employ this general set of metacognitive strategies. (3) In contrast to

WP, there was no attempt in the control condition to explicitly teach students to understand word

problems in terms of the combine, compare, or change schemas. (4) Control group instruction

provided strong emphasis on computational requirements for problem solution; WP intervention

provided none. (5) Although control group instruction allocated 3 weeks of instruction to finding

missing addends, it focused substantially less on this topic and did not connect this topic to the

structure of word problems. (6) Control group instruction taught children to rely on keywords

(e.g., more is a signal to add the numbers in the problem), a common approach in schools; WP

avoided keywords (because they only produce correct solutions ~50% of the time). Important

commonalities between the control and WP conditions were as follows: instruction addressed

one problem type at a time; focused on concepts underlying the problem type; provided students

with explicit steps for deriving solutions; and relied on worked examples, guided group practice,

and independent practice with corrective feedback.

Structure of Whole-Class CAL and WP Intervention

Whole-class CAL and WP instruction occurred for 17 weeks, two lessons per week, each

40-45 min. RA teachers delivered instruction using lesson guides, included in the program

manuals, to promote fidelity. Guides were studied, not read or memorized, to support an

authentic teaching style. As in Math Wise and Pirate Math, CAL and WP whole-class instruction

is explicit. Each session begins with the teacher explaining worked examples and moving to

guided practice, with constant student participation in applying demonstrated strategies in

partially worked and then unworked problems. This guided practice is conducted in large-group

format. Then children work cooperatively on problems in pairs, while the teacher circulates to

provide help as needed. The lesson ends with independent practice on the day’s topic. Corrective

feedback is systematically provided, and cumulative review is systematically incorporated.

In CAL whole-class instruction, this general framework is captured in four activities per

lesson. The first, the Daily Lesson, focuses on six to eight problems on the day’s topic. This

involves the teacher presenting worked examples and moving to partially worked and then

unworked problems, with constant student participation. The second activity is Partner Work, in

which paired students (a higher- and lower-performer) work in a structured manner on 12

problems on that day’s topic, as coach and player. As validated with Peer-Assisted Learning

Strategies (Fuchs et al., 1997), the higher-performing student begins as coach, who asks the

player step-by-step questions to model the teacher’s solution strategy and provides corrective

feedback. This occurs on three problems. Then the player “thinks-aloud” the solution strategy on

the next three problems, while the coach monitors and provides corrective feedback. Students

then switch roles, so the lower-performer asks his/her partner to model the solution strategy on

problems 7-9, while providing corrective feedback, and finally the higher performer talks aloud

the final three problems. The teacher circulates while providing assistance. The third activity is

Time Owl (beginning Lesson 15), which presents a page of different types of calculation

problems. The teacher provides a directive (e.g., “solve addition problems that require

regrouping”); then students have 1 min to find and complete that type of problem. At the end of

1 min, students switch Time Owls with partners and check answers. The last activity, Individual

Practice, provides students with a practice sheet of 25 problems on the day’s topic and

cumulative review. At the end of 5 min, students switch papers to check answers. During the

final three activities, students earn points for working with partners cooperatively and producing

accurate work. The pair with the most points collects Math Wise folders.

WP whole-class instruction also includes four activities per lesson. The first is the

Teacher-Led Problem, in which the teacher reviews and leads discussion about a problem from

the previous day’s lesson. Each student has a copy of that problem, which the teacher and

students work through together. The second activity is the Daily Lesson, focused on that day’s

topic. This involves the teacher presenting worked examples and moving to partially-worked and

then to unworked problems, with constant student participation. The third activity is Partner

Work, in which students work in pairs to solve two word problems on the day’s topic. After ~8

min, the teacher shares answers with the class, against which the pair compares its work. The

final activity is Individual Practice, in which students complete five find x problems and one

word problem. As each student finishes, the teacher checks his/her work. Students earn points for

each part of a correctly applied solution strategy and record scores on a “treasure map.”

Structure of CAL and WP Tutoring

CAL and WP tutoring provides 3 sessions per week, each for 25-30 min for 13 weeks

(beginning in Weeks 4-5 of whole-class instruction), in a quiet space outside the classroom. Each

group includes 2-3 students. The content of tutoring mirrors the content of whole-class

instruction. However, tutoring targets the most difficult concepts from classroom instruction;

incorporates manipulatives more frequently; employs additional scaffolding to support learning;

teaches additional strategies; and incorporates self-regulated learning strategies with tangible

reinforcement. Throughout each lesson, tutors continuously engage students by posing questions

to one student or the group. Tutors set a timer at random intervals (as prescribed in the lesson

guide) and award a checkmark if all students are on-task (e.g., following directions, working

hard) each time the timer beeps. (Students do not know when the timer will beep.) At the end of

lessons, tutors also award individual points for correct answers on “bonus problems,” which are

prescribed to tutors in the lesson guide. (Students do not know which problems will earn points

until the end of the lesson.) Students color an owl/pirate for each checkmark earned. When 16

owls/pirates are colored, they pick a small prize.

Each CAL tutoring lesson comprises six activities. The first is Number Combination

Flash Cards (through Lesson 13, the focus is addition; then addition and subtraction). The tutor

shows cards to students in round robin format for 1 min. When an error occurs, responding

pauses while the student uses the taught counting strategy to derive the correct answer. At the

end of 1 min, correctly answered cards are counted. Then the deck is reshuffled for another 1-

min round, in which students try to beat that score. The group graphs the higher score. The

second activity is the Number Combination Lesson. The tutor reviews number knowledge

content (e.g., cardinality, mental representation of the number line, inverse relation between

addition and subtraction, commutativity for addition) and efficient counting strategies (i.e., min

strategy for adding; missing addend strategy for subtracting). Lessons begin with +1/-1, moving

to +0/-0 and then doubles. Next, lessons focus on number combinations sets and families, using

manipulatives to build connections. The third activity is Owl Belly, for which each student’s

worksheet shows eight owls, each with a number on its belly. The tutor reads a problem (e.g., 9 +

3); students color the owl with the answer (12). Tutors present six number combinations so two

owl bellies remain uncolored. Then, students create one addition problem and one subtraction

problem for the remaining owls. For the fourth activity, Double-Digit Flash Cards, students

respond in round-robin style for 1 min by saying whether each problem requires addition or

subtraction (Lessons 1-21) or whether the problem requires regrouping and explaining why

(Lessons 22-39). In the next activity, the Double-Digit Lesson, tutors review stories and rhymes,

taught in whole-class instruction, which capture place value concepts and the number knowledge

basis for regrouping. Then students complete six related problems, using Base-10 blocks to

illustrate regrouping concepts. The final activity is Paper-and-Pencil Review. Students have 1

min to complete 10 1-digit problems and 2 min to complete five 2-digit problems. Tutors then

provide corrective feedback.

Each WP tutoring comprises four activities. The first is the Game, which differs by lesson

topic: Find X Game, Total Game (filling in equations based on orally-presented problems), or

Difference Game (underlining and drawing > or < between two compared amounts in word

problems). In the second activity, the Daily Lesson, students complete 3-4 word problems using

strategies taught in whole-class instruction or tutoring; most problems are aligned with the unit’s

topic, but cumulative review across problem types is incorporated. With the third activity, the

Sorting Game, the tutor reads word problems; students name schemas in round-robin style for 2

min; the tutor provides corrective feedback. For the final activity, Paper-Pencil Review, students

work individually on find x problems for 2 min and a word problem for 2 min. Tutors provide

corrective feedback.

Research Assistant (RA) Teachers and Tutors

Across the four cohorts, the typical RA was one to two years beyond undergraduate

education, studying for a graduate degree in education, special education, counseling, or

education policy. Ten RA-teachers and 30 RA-tutors worked 1 year; five RA-teachers and four

RA-tutors worked more than one year. One full-time project coordinator also served as an RA-

teacher, as she led the RA-teachers; another served as an RA-tutor, while she led the RA tutors.

Different sets of RAs conducted whole-class instruction versus tutoring. To control for RA

quality, all RA teachers taught classrooms in the CAL and WP conditions; all RA-tutors taught

groups in the CAL and WP conditions. We guarded against contagion, by color coding lesson

guides and teaching materials by condition; reviewing distinctions between the two conditions in

weekly meetings; and providing corrective feedback on live observations and audiotapes of

whole-class and tutoring sessions.

To prepare and support RAs, we began with an introductory meeting, in which we

discussed research ethics and procedures, including professional behavior in schools. Then RAs

were prepared for their specific role (teacher or tutor) in two full-day sessions, in which RAs

were introduced to the project and its goals, assigned roles (teacher or tutor), and provided

instruction, demonstrations, and lesson guides to study. Next, they were paired to practice

designated first-unit lessons. Then, they conducted one lesson for a project coordinator (they did

not know ahead which lesson) and were judged on a point-by-point system for fidelity. RAs who

achieved 95% fidelity were deemed reliable. Those who scored below 95% were coached on

missed points, required to practice more, and then re-rated at a later time on another lesson. This

recurred until the RA met criteria. At weekly meetings, project coordinators and the first author

met for problem-solving sessions separately with RA-teachers and with RA-tutors. At the

beginning of each unit, a 3-hour session oriented RAs. Also, all sessions were audiotaped, and a

sample of live observations was conducted for each RA. On that basis, project coordinated

provided written and oral corrective feedback and one-to-one retraining as needed. RA-teachers

and RA-tutors met periodically to share observations of students they shared between whole-

class instruction and tutoring.

Fidelity

Prior to the first whole-class and tutoring session, research staff agreed on the essential

information in each lesson and made a checklist of points for each lesson. This was done for

CAL and WP whole-class instruction and tutoring. Each session was audiotaped. At the study’s

end, RAs independently listened to a random sample of tapes while completing checklists to

identify the percentage of points addressed. We sampled 20% of whole-class instruction tapes

equitably within conditions, RA-teachers, classrooms, and lesson types; we sampled 20% of

tutoring tapes equitably within conditions, RA-tutors, tutoring groups, and lesson types. For

whole-class intervention, the mean percentage of points addressed was 95.87 (SD = 1.40) for

CAL and 94.86 (SD = 1.85) for WP, t(16) = 1.70, p = .110. For tutoring, the mean percentage of

points addressed was 96.06 (SD = 2.83) for CAL and 96.34 (SD = 3.28) for WP, t(35) = 0.58, p =

.563. (In these fidelity analyses, teacher/tutor was the unit of analysis; t tests are for dependent

samples because each teacher/tutor taught in both conditions.)

Supplemental Results

Supplemental analyses were conducted, analogous to those presented in the published

manuscript. Given the quasi-experimental nature of those results (see text), here we provide

results from the subsample that did not encounter randomization issues. This subsample was of

55 teachers and 483 students in 18 schools. The majority of the excluded sample was from the

first cohort of students.

Table S1 shows raw score and standard score means and SDs for all individual measures

that were combined into the outcomes. As with the sample as a whole, there were no significant

differences as a function of treatment condition on any pretest measure. Tables S2 to S4 show

results for the calculations, word-problems, and pre-algebraic knowledge, respectively. These

tables followed directly from those in the published manuscript, for maximum comparability.

One consistent effect in the results that follow is that standard errors, particularly for random

effects, were increased, particularly for school, which is not surprising given the smaller sample;

in addition, there were fewer effects of specific included covariates. We did run exploratory

analyses that eliminated non-significant effects in this smaller sample, and others that eliminated

the random effect of school, but in no case did these change the treatment effect presented below,

except where indicated. In general, results were highly similar to those of the full sample.

Therefore, descriptions below focus on differences.

Proximal Calculation Effects (Table S2, Left Side)

For the unconditional model, the ICCs were .13 for teacher and .03 for school. Model fit

statistics: deviance = 1330.1, parameters = 3, AIC = 1336.1, BIC = 1338.7. In the final full

model, there were significant effects for pretest, p < .001, for cohort, p < .001, and for treatment

condition, F(2, 54.1) = 4.42, p < .017. Follow up to the treatment effect indicated CAL

outperformed control (p < .006; ES = 0.47); the difference between CAL and WP was now p

< .054, with ES = 0.29; as in the original analyses, the latter two groups did not differ (p = .256;

ES = 0.17). Model fit statistics for the full/final model were: deviance = 1024.8, parameters = 2,

AIC = 1028.8, BIC = 1030.6.

Distal Calculation Effects (Table S2, Right Side)

The unconditional model showed an ICC of .07 for teacher and .02 for school (model fit:

deviance = 1239.4, parameters = 3, AIC = 1245.4, BIC = 1248.1). In the final model (fit:

deviance = 1030.1, parameters = 2, AIC = 1034.1, BIC = 1035.9), there were significant effects

for pretest (p < .001), ethnicity (p < .001), RFL status (p < .016), tutoring eligibility status (p

< .002), and treatment condition, F(2, 42.2) = 9.84, p < .001. Unlike the model of the full

sample, here sex (p < .116) was not significant, but cohort was, F(2, 49.5), p < .001; cohorts 3

and 4 outperformed cohort 2. Follow up to the treatment effect indicated CAL outperformed WP

(p < .011; ES = 0.24) and control (p < .001; ES = 0.39), but now WP outperformed control as

well (p = .042; ES = 0.15)—but did not if sex was eliminated from the model (p = .053).

Proximal Word-Problem Effects (Table S3, Left Side)

The unconditional model showed an ICC of .34 for teacher and 0 for school (model fit:


deviance = 2907.7, parameters = 3, AIC = 2913.7, BIC = 2916.4), there were significant effects

for pretest (p < .001), sex (p < .023), and treatment condition, F(2, 41.8) = 47.25, p < .001. As in

the full sample, follow up to the treatment effect indicated that WP outperformed CAL (p < .001;

ES = 1.21) and control (p < .001; ES = 1.16), but the latter two groups did not differ (p = .656;

ES = -0.05). Unlike the model of the full sample, here tutoring eligibility status was not

significant (p = .178), and neither was cohort, p = .250. There remained a significant interaction

of treatment and cohort, p < .003, which suggested relative differences between WP and the

other groups as a function of cohort; however, WP consistently outperformed the other two

groups in each cohort, though the difference between WP and control was not significant in

cohort 2.

Distal Word-Problem Effects (Table S3, Right Side)

The unconditional model showed an ICC of 0 for teacher and .06 for school (model fit:


deviance = 763.0, parameters = 3, AIC = 767.0, BIC = 768.8), there were significant effects for

pretest (p < .001), tutoring eligibility status (p < .001), ESL status (p < .002), ethnicity, (p

< .001), but not cohort, p = .901, or for treatment, p = .898 (as was the case with the full sample,

ES range, .03 to .15). Unlike the full sample model, the prior effects for sex (p = .139) and for

RFL (p = .058) were not significant.

Proximal Pre-Algebraic Knowledge Effects (Table S4, Left Side)

The unconditional model showed an ICC of .68 for teacher and 0 for school (model fit:

deviance = 842.4, parameters = 2, AIC = 846.4, BIC = 848.1). In the final model (fit: deviance =

627.1, parameters = 3, AIC = 633.1, BIC = 635.8), there was a significant effect for the word

problems pretest measures (p < .001) although no longer for the computation pretest (p = .178),

for RFL status (p = .160), ESL status (p = .140), ethnicity (p = .067). Tutoring eligibility status

and cohort were also not significant (both p > .05), as in the full sample model, and the

interaction of treatment and cohort was also no longer significant (p = .376). However, as in the

full sample model, treatment was significant, F(2, 33.8) = 77.42, p < .001. Follow up to the

treatment effect indicated WP outperformed CAL (p < .001; ES = 1.46) and control (p < .001;

ES = 1.63), but the latter two groups did not differ (p = .232; ES = 0.16).

Distal Pre-Algebraic Knowledge Effects (Table S4, Right Side)

The unconditional model showed an ICC of .02 for teacher and .01 for school (model fit:


deviance = 931.5, parameters = 1, AIC = 933.5, BIC = 934.4), there were significant effects for

both pretests (both p < .001), ethnicity (p < .001), ESL status (p < .030), tutoring eligibility status

(p < .010), though no longer for sex (p = .351), ESL status (p = .198), or cohort (p = .556). As in

the full sample, there was a significant treatment effect, F(2, 434) = 3.72, p < .025. WP

outperformed CAL (p < .048; ES = 0.25) and control (p < .012; ES = 0.28); the latter two groups

did not differ (p = .385; ES = 0.03).

Table S1Performance Data by Treatment Condition (n = 55 Teachers, n = 483 Students)

ConditionCAL WP Control

Pre Post Pre Post Pre PostVariable X (SD) X (SD) X (SD) X (SD) X (SD) X (SD)

Screening Sums to 12 8.43 (4.73) NA 8.31 (4.75) NA 8.89 (5.19) NA Story Problems 6.57 (3.71) NA 6.70 (3.47) NA 6.60 (3.47) NA WASI IQ 92.73 (13.47) NA 93.99 (13.39) NA 91.95 (12.20) NACalculation Outcomes Proximal Sums to 18 6.34 (3.72) 12.40 (5.35) 5.95 (3.60) 11.43 (5.06) 5.71 (3.53) 10.22 (5.48) Minuends to 12 4.53 (2.59) 8.93 (5.49) 4.62 (2.88) 8.39 (4.41) 4.29 (2.81) 7.70 (4.71) Minuends to 18 3.37 (2.72) 7.89 (4.44) 3.04 (2.80) 7.19 (4.43) 2.92 (2.38) 6.09 (4.84) 2-Digit Addition 3.57 (3.71) 15.52 (4.80) 3.10 (3.33) 12.23 (5.79) 3.64 (3.39) 11.42 (6.69) 2-Digit Subtraction 2.05 (2.50) 9.05 (5.18) 2.13 (2.49) 6.52 (4.03) 2.37 (2.71) 6.88 (4.60) Distal WRAT-Raw 19.11 (2.69) 23.73 (2.05) 19.41 (2.46) 23.24 (2.19) 19.00 (2.71) 22.71 (2.76) -Standard 93.56 (12.36) 102.73 (10.24) 94.63 (12.74) 100.59 (10.41) 92.31 (13.33) 98.41 (12.66) Key Math Addition-Raw 9.72 (2.22) 9.04 (2.33) 8.82 (2.49) -Standard 109.16 (12.80) 105.86 (13.69) 104.27 (14.31) Key Math Subtraction-Raw 6.07 (2.39) 5.45 (2.34) 5.42 (2.40) -Standard 103.25 (14.61) 100.75 (14.30) 99.73 (15.22)Word-Problem Outcomes Proximal 2nd Grade Story Problems 8.23 (5.00) 12.13 (6.18) 8.04 (5.28) 19.61 (7.63) 7.29 (3.99) 12.38 (5.88) Distal Key Math Prob Solving-Raw 2.29 (1.65) 3.63 (2.11) 2.35 (1.58) 3.83 (1.89) 2.38 (1.86) 3.58 (1.97) -Standard 103.38 (9.65) 105.03 (10.96) 104.17 (9.58) 107.03 (10.54) 103.96 (10.99) 104.58 (11.03) Iowa-Raw NA 16.50 (5.53) NA 16.47 (5.23) NA 16.45 (5.14) -Standard NA 156.78 (19.22) NA 156.66 (18.18) NA 156.54 (17.48)Pre-Algebra Proximal Find X NA 4.34 (3.06) NA 7.00 (1.46) NA 4.37 (3.09) Number Sentences NA 1.17 (1.61) NA 3.58 (2.53) NA 0.83 (1.36) Distal Dynamic Assessment NA 7.98 (4.38) NA 8.90 (4.43) NA 8.09 (4.89)

Test of Pre-Algebra NA 10.92 (5.30) NA 11.83 (4.84) NA 10.46 (5.15)

Table S2

Model Characteristics for Proximal and Distal Calculation Outcomes (n = 55 Teachers, n = 483 Students)

Proximal Distalt/ 95% Parameter t/ 95%

Parameter Estimate (S.E.) Df z/F CI Estimate (S.E.) df z/F CI

Unconditional Model Fixed: Intercept (γ000) 0.10 (0.08) 10.9 1.18 -0.08 to 0.27 0.02 (0.06) 10.2 0.29 -0.11 to 0.15 Random: Teacher (σ u 0

2 ) 0.13 (0.05) 2.40b 0.06 to 0.35 0.05 (0.03) 1.63a 0.02 to 0.29 School ¿) 0.03 (0.05) 0.65 0.01 to 103.0 0.01 (0.02) 0.46 0.00 to 146 Residual (σ e

2) 0.84 (0.06) 14.61c 0.74 to 0.97 0.72 (0.05) 14.58c 0.63 to 0.83Full/Final Model Fixed: Intercept (γ000) -0.26 (0.11) 49.7 -2.25a -0.49 to -0.03 0.13 (0.14) 96.1 0.96 -0.14 to 0.40 Calculations Pretest 0.61 (0.04) 434 14.15c 0.52 to 0.69 0.35 (0.04) 428 8.26c 0.30 to 0.44 Tutoring Eligibility -0.05 (0.09) 434 -0.49 -0.23 to 0.14 -0.29 (0.09) 427 -3.09b -0.48 to -0.11 Sex (Female) NA 0.11 (0.07) 422 1.57 -0.03 to 0.25 Lunch (Free/Reduced)

NA -0.27 (0.11) 321 -2.43a -0.48 to -0.06

Ethnicity (A Amer) NA -0.30 (0.11) 363 -2.83b -0.51 to -0.09 Ethnicity (Hispanic) NA -0.01 (0.11) 383 -0.05 -0.23 to 0.22 Ethnicity (Other) NA 0.42 (0.16) 416 2.70b 0.12 to 0.73 Treatment (CAL) 0.41 (0.14) 58.3 2.88b 0.12 to 0.69 0.46 (0.11) 48.1 4.36c 0.25 to 0.67 Treatment (WP) 0.16 (0.14) 56.5 1.15 -0.12 to 0.44 0.22 (0.10) 50.7 2.08a 0.01 to 0.43 Cohort (2) -0.07 (0.14) 61 -0.51 -0.34 to 0.20 -0.29 (0.10) 41.7 -2.83b -0.49 to -0.08 Cohort (3) 0.41 (0.12) 58.9 3.36b 0.17 to 0.66 0.13 (0.09) 51.6 1.47 -0.04 to 0.32 Random: Teacher (σ u 0

2 ) 0.09 (0.03) 2.79b 0.05 to 0.21 0.01 (0.02) 0.44 0.00 to 407 School ¿) -- -- -- -- -- -- -- -- Residual (σ e

2) 0.51 (0.03) 13.88c 0.44 to 0.59 0.55 (0.04) 13.69c 0.48 to 0.64

Note. ap < .05; bp < .01; cp < .001. CAL is calculation intervention. WP is word-problem intervention. Proximal and distal outcomes are standardized composites. Covariance terms are not shown. For the unconditional model, t values are reported. For full/final model, F-values are reported for the fixed portion of model; z-values for the random portion of model. For dfs, numerators (not included) are all 1; denominators (reported) use the Kenward-Roger approximation. Under Fixed Effects, intercept refers to the outcome variable. Pretest is grand-mean centered. The code 0 is for the treatment control group, for no free lunch, for Caucasian, for ineligible for tutoring, and for cohort 4. For meaning of symbols under Random Effects, see text – although symbols in the table stand for variance components (e.g., σ e

2 denotes the student-level residual variance).

Table S3

Model Characteristics for Proximal and Distal Word-Problem Outcomes (n = 55 Teachers, n = 483 Students)

Proximal Distalt/ 95% Paramete

rt/ 95%

Parameter Estimate (S.E.) df z/F CI Estimate (S.E.) df z/F CI

Unconditional Model Fixed: Intercept (γ000) 14.98 (0.66

)55.8 22.63c 13.65 to 16.30 -0.03 (0.06) 14 -0.48 -0.17 to 0.11

Random: Teacher (σ u 02 ) 20.00 (4.78

)4.18c 13.16 to 34.03 -- -- -- --

School ¿) -- -- -- -- 0.04 (0.03) 1.65a 0.02 to 0.23 Residual (σ e

2) 38.76 (2.66)

14.57c 34.03 to 44.54 0.65 (0.04) 15.12c 0.57 to 0.74

Full/Final Model Fixed: Intercept (γ000) 11.56 (1.02

)33 11.36c 9.49 to 13.63 0.23 (0.11) 66.9 2.07a 0.01 to 0.45

Word-Problem Pretest 5.08 (0.35)

459 14.46c 4.39 to 5.78 0.75 (0.04) 454 20.14c 0.68 to 0.83

Tutoring Eligibility -0.86 (0.64)

460 -1.35 -2.12 to 0.39 -0.23 (0.06) 458 -3.58c -0.35 to -0.10

Sex (Female) 1.06 (0.46)

444 2.28a 0.15 to 1.97 -0.07 (0.05) 453 -1.48 -0.17 to -0.02

Lunch (Free/Reduced) NA -0.14 (0.07) 451 -1.90 -0.29 to 0.00 Ethnicity (A Amer) NA -0.19 (0.08) 439 -2.49a -0.33 to -0.04 Ethnicity (Hispanic) NA 0.07 (0.09) 441 0.86 -0.10 to 0.24 Ethnicity (Other) NA 0.18 (0.11) 456 1.62 -0.04 to 0.39 Second Language NA -0.26 (0.08) 457 -3.18b -0.42 to -0.10 Treatment (CAL) 1.33 (1.34

)51.4 1.00 -1.35 to 4.01 0.02 (0.07) 441 0.29 -0.12 to 0.18

Treatment (WP) 8.31 (1.37 53.3 6.07c 5.56 to 11.05 0.03 (0.07) 455 0.46 -0.11 to 0.17

) Cohort (2) 1.47 (1.77

)37.4 0.84 -2.10 to 5.06 0.04 (0.10) 23.1 0.41 -0.17 to 0.25

Cohort (3) 0.98 (1.53)

38.2 0.64 -2.11 to 4.07 0.04 (0.10) 16.8 0.41 -0.16 to 0.24


)1.70a 0.87 to 10.67 -- -- -- --

School ¿) 1.21 (1.11)

1.09 0.35 to 28.90 0.02 (0.01) 1.78a 0.01 to 0.08

Residual (σ e2) 24.50 (1.70

)14.41c 21.48 to 28.20 0.26 (0.02) 14.98c 0.23 to 0.30

Note. See note for Table 2. Parameter estimates for the significant treatment by cohort interaction for Proximal Word Problems are not shown in the table, but were included in the model from which the above numbers were derived, and are described in text.

Table S4

Model Characteristics for Proximal and Distal Pre-Algebra Outcomes (n = 55 Teachers, n = 483 Students)

Proximal Distalt/ 95% Parameter t/ 95%

Parameter Estimate (S.E.) df z/F CI Estimate (S.E.) df z/F CI

Unconditional Model Fixed: Intercept (γ000) -0.03 (0.06

)56.2 -0.47 -0.15 to 0.09 -0.00 (0.05) 11.

20.09 -0.11 to 0.12


)4.41c 0.12 to 0.30 0.02 (0.03) 0.71 0.00 to 17.25

School ¿) -- -- -- -- 0.01 (0.02) 0.57 0.00 to 384 Residual (σ e

2) 0.27 (0.02)

14.56c 0.24 to 0.31 0.77 (0.05) 14.53c 0.68 to 0.89

Full/Final Model Fixed: Intercept (γ000) -0.22 (0.10

)57.8 -2.20a -0.43 to -0.02 0.09 (0.10) 434 0.87 -0.11 to 0.29

Word-Problem Pretest 0.32 (0.04)

419 8.14c 0.24 to 0.40 0.55 (0.06) 434 9.90c 0.44 to 0.66

Calculations Pretest 0.04 (0.03)

427 1.35 -0.02 to 0.10 0.21 (0.04) 434 4.99c 0.13 to 0.29

Tutoring Eligibility -0.07 (0.06)

428 -1.10 -0.19 to 0.05 -0.22 (0.09) 434 -2.58a -0.39 to -0.05

Sex (Female) NA -0.06 (0.06) 434 -0.93 -0.18 to -0.07 Lunch (Free/Reduced) -0.10 (0.07

)370 -1.41 -0.24 to 0.04 NA

Second Language -0.11 (0.08)

374 -1.48 -0.26 to 0.04 -0.13 (0.10) 434 -1.29 -0.33 to 0.07

Ethnicity (A Amer) -0.08 (0.07)

393 -1.19 -0.22 to 0.05 -0.31 (0.09) 434 -3.42c -0.49 to -0.13

Ethnicity (Hispanic) 0.09 (0.08)

382 1.08 -0.07 to 0.24 -0.00 (0.10) 434 -0.00 -0.20 to 0.20

Ethnicity (Other) 0.08 (0.11)

410 0.76 -0.12 to 0.28 0.22 (0.13) 434 1.61 -0.05 to 0.48

Treatment (CAL) 0.15 (0.11)

43.3 1.36 -0.07 to 0.38 0.08 (0.09) 434 0.87 -0.10 to 0.26

Treatment (WP) 0.77 (0.12)

46.4 6.66c 0.54 to 1.00 0.23 (0.09) 434 2.52a 0.05 to 0.40

Cohort (2) -0.04 (0.14)

32.7 -0.26 -0.32 to 0.25 0.01 (0.09) 434 0.13 -0.16 to 0.18

Cohort (3) 0.02 (0.12)

38.2 0.15 -0.23 to 0.27 0.08 (0.08) 434 1.01 -0.07 to 0.23


)0.91 0.00 to 0.71 -- -- -- --

School ¿) 0.00 (0.01)

0.70 0.00 to 5.72 -- -- -- --

Residual (σ e2) 0.21 (0.02

)13.67c 0.18 to 0.24 0.44 (0.03) 14.73c 0.38 to 0.50

Note. See note for Table 2. Parameter estimates for the non-significant treatment by cohort interaction for Proximal Pre-Algebra are not shown in the table, but were included in the model from which the above numbers were derived, and are described in text.