Productive failure in learning the concept of variance

22
Productive failure in learning the concept of variance Manu Kapur Received: 21 April 2011 / Accepted: 7 March 2012 / Published online: 22 March 2012 Ó Springer Science+Business Media B.V. 2012 Abstract In a study with ninth-grade mathematics students on learning the concept of variance, students experienced either direct instruction (DI) or productive failure (PF), wherein they were first asked to generate a quantitative index for variance without any guidance before receiving DI on the concept. Whereas DI students relied only on the canonical formulation of variance taught to them, PF students generated a diversity of formulations for variance but were unsuccessful in developing the canonical formulation. On the posttest however, PF students significantly outperformed DI students on conceptual understanding and transfer without compromising procedural fluency. These results chal- lenge the claim that there is little efficacy in having learners solve problems targeting concepts that are novel to them, and that DI needs to happen before learners should solve problems on their own. Keywords Problem solving Productive failure Multiple representations Mathematics Classroom-based research Introduction Proponents of direct instruction (DI) bring to bear substantive empirical evidence against unguided or minimally guided instruction to claim that there is little efficacy in having learners solve problems that target novel concepts, and that learners should receive DI on the concepts before any problem solving (Sweller 2010; Kirschner et al. 2006). Kirschner et al. (2006) argued that ‘‘Controlled experiments almost uniformly indicate that when dealing with novel information, learners should be explicitly shown what to do and how to do it’’ (p. 79). Commonly cited problems with unguided or minimally guided instruction M. Kapur (&) Curriculum, Teaching and Learning, Learning Sciences Laboratory, National Institute of Education, Nanyang Technological University, 1 Nanyang Walk, Singapore 637616, Singapore e-mail: [email protected] 123 Instr Sci (2012) 40:651–672 DOI 10.1007/s11251-012-9209-6

Transcript of Productive failure in learning the concept of variance

Page 1: Productive failure in learning the concept of variance

Productive failure in learning the concept of variance

Manu Kapur

Received: 21 April 2011 / Accepted: 7 March 2012 / Published online: 22 March 2012� Springer Science+Business Media B.V. 2012

Abstract In a study with ninth-grade mathematics students on learning the concept of

variance, students experienced either direct instruction (DI) or productive failure (PF),

wherein they were first asked to generate a quantitative index for variance without any

guidance before receiving DI on the concept. Whereas DI students relied only on the

canonical formulation of variance taught to them, PF students generated a diversity of

formulations for variance but were unsuccessful in developing the canonical formulation.

On the posttest however, PF students significantly outperformed DI students on conceptual

understanding and transfer without compromising procedural fluency. These results chal-

lenge the claim that there is little efficacy in having learners solve problems targeting

concepts that are novel to them, and that DI needs to happen before learners should solve

problems on their own.

Keywords Problem solving � Productive failure � Multiple representations �Mathematics � Classroom-based research

Introduction

Proponents of direct instruction (DI) bring to bear substantive empirical evidence against

unguided or minimally guided instruction to claim that there is little efficacy in having

learners solve problems that target novel concepts, and that learners should receive DI on

the concepts before any problem solving (Sweller 2010; Kirschner et al. 2006). Kirschner

et al. (2006) argued that ‘‘Controlled experiments almost uniformly indicate that when

dealing with novel information, learners should be explicitly shown what to do and how to

do it’’ (p. 79). Commonly cited problems with unguided or minimally guided instruction

M. Kapur (&)Curriculum, Teaching and Learning, Learning Sciences Laboratory, National Institute of Education,Nanyang Technological University, 1 Nanyang Walk, Singapore 637616, Singaporee-mail: [email protected]

123

Instr Sci (2012) 40:651–672DOI 10.1007/s11251-012-9209-6

Page 2: Productive failure in learning the concept of variance

include increased working memory (WM) load that interferes with schema formation

(Sweller 1988), encoding of errors and misconceptions (Brown and Campione 1994), lack

of adequate practice and elaboration (Klahr and Nigam 2004), as well as affective prob-

lems of frustration and de-motivation (Hardiman et al. 1986).

Consequently, this has led to a commonly held belief that there is little efficacy in

having learners solve novel problems that target concepts they have not learnt yet. Perhaps

this belief is best captured by Sweller (2010), ‘‘What can conceivably be gained by leaving

the learner to search for a solution when the search is usually very time consuming, may

result in a suboptimal solution, or even no solution at all?’’ (p. 128). The basis for this

belief comes from a large body of empirical evidence that has compared some form of

heavily guided DI (e.g., worked examples) favorably with unguided or minimally guided

discovery learning instruction (Kirschner et al. 2006). It is of course not surprising that

learners do not learn from unguided or minimally guided discovery learning when com-

pared with a heavily guided DI. However, the conclusion that there is little efficacy in

having learners solve problems that target concepts they have not learnt yet—something

that they have to do in unguided discovery learning—does not follow.

To determine if there is such an efficacy, a stricter comparison for DI would be to

compare it with an approach where students first generate representations and methods to

novel problems on their own followed by DI. It can be expected that the generation process

will likely lead to failure. By failure, I simply mean that students will not be able to

develop or discover the canonical solutions by themselves. Yet, what is critical is not the

failure to develop the canonical solution per se but the very process of generating and

exploring multiple RSMs, which can be productive for learning provided that DI on the

targeted concepts is subsequently provided (Kapur and Bielaczyc 2011; Schwartz and

Martin 2004).

This paper explores the possibility of affording learners the opportunity to engage in a

process of generating solutions to novel problems, and shows how this process invariably

leads to suboptimal solutions (that is, failure to generate the canonical solutions) but can

still be a productive exercise in failure provided some form of DI follows (Kapur 2009,

2010). Thus argued, instead of reporting yet another experiment comparing discovery

learning with DI, the work presented herein seeks to understand whether combining the

two—as instantiated in the learning design I call productive failure (PF) (Kapur 2008;

Kapur and Bielaczyc 2012)—can be more effective than DI alone.

I start with a brief review of two bodies of research: one that supports the case for DI

and another that argues for PF, and points to an efficacy of learner-generated solutions

provided an appropriate form of DI builds upon it. Following this, I present empirical

evidence from a classroom-based study that compares the efficacy of a PF design with a DI

design in learning the concept of variance by ninth-grade mathematics students. I end by

discussing the findings and drawing implications for theory and research.

The case for DI

Research on worked examples—often epitomized as strongly guided instruction—consti-

tutes perhaps the strongest case for DI (Sweller 2010). An extensive body of empirical

work suggests that when dealing with novel concepts, learners who only solve problems

perform worse than those who solve problems after studying equivalent worked examples

(Sweller and Copper 1985; Cooper and Sweller 1987). The argument being that pure

problem solving search, especially means-ends analysis, imposes a heavy load on a limited

652 M. Kapur

123

Page 3: Productive failure in learning the concept of variance

WM capacity thereby reducing resources that can be devoted to schema acquisition. In

contrast, learning with worked examples reduces cognitive load on WM, thereby allowing

more resources for schema acquisition. The superior effect of worked examples has been

demonstrated in several studies (Carroll 1994; Paas 1992; Paas and van Merrienboer 1994;

Trafton and Reiser 1993). For example, Paas (1992) found that students who were given

fully or partially worked examples learnt more than those who solved problems. However,

this study did not find the marginal benefit of providing a fully worked example over a

partially worked out one.

One of the more direct comparisons between DI and discovery learning comes from

an often-cited study by Klahr and Nigam (2004) on the learning of the control of

variable strategy (CVS) in scientific experimentation. On the acquisition of basic CVS

skill as well as ability to transfer the skill to evaluate the design of science experiments,

their findings suggested that students in the DI condition who were explicitly taught how

to design un-confounded experiments outperformed their counterparts in the discovery

learning condition who were simply left alone to design experiments without any

instruction or feedback. Further experiments by Klahr and colleagues have largely bol-

stered the ineffectiveness of discovery learning compared with DI (e.g., Strand-Cary and

Klahr 2008).

It follows then that for learners dealing with novel concepts, DI through worked-

examples seem superior to discovering or constructing solutions by themselves without

any instruction whatsoever. All the above studies have largely compared some ver-

sion(s) of a worked example or strong instructional guidance condition with a pure

discovery condition. However, the above findings do not necessarily imply that there is

little efficacy in having learners solve novel problems, that is, problems that target

concepts they have not learnt yet (Schmidt and Bjork 1992). As argued earlier, to

determine if there is such an efficacy, a stricter comparison is needed wherein DI is

compared with an approach where students first generate RSMs on their own followed

by DI. Evidence from several studies supports this contention (Kapur 2008, 2009, 2010;

Schwartz and Bransford 1998; Schwartz and Martin 2004), and thus forms the focus of

the following section.

The case for of failure in learning and problem solving

Research on impasse-driven learning (Van Lehn et al. 2003) with college students in

coached problem-solving situations provides strong evidence for the role of failure in

learning. Successful learning of a principle (e.g., a concept, a Physical law) was associated

with events when students reached an impasse during problem solving. Conversely, when

students did not reach an impasse, learning was rare despite explicit tutor-explanations of

the target principle. Instead of providing immediate or DI upfront, e.g., in the form of

feedback, questions, or explanations, when the learner demonstrably makes an error or is

‘‘stuck,’’ Van Lehn et al. (2003) findings suggest that it may well be more productive to

delay that instruction up until the student reaches an impasse—a form of failure—and is

subsequently unable to generate an adequate way forward.

Building on this, Mathan and Koedinger (2003) compared learning under two different

feedback conditions on student errors. In the immediate feedback condition, a tutor gave

immediate feedback on student errors. In the delayed feedback condition, the tutor allowed

the student to detect their own error first before providing feedback. Their findings

Productive failure 653

123

Page 4: Productive failure in learning the concept of variance

suggested that students in the delayed feedback condition demonstrated a faster rate of

learning from and on all the subsequent problems. Delayed feedback on errors seemed to

have resulted in better retention and better preparation to learn from subsequent problems

(Mathan and Koedinger 2003).

Further evidence for such preparation for future learning (PFL; Schwartz and Bransford

1998) can be found in the inventing to prepare for learning (IPL) research by Schwartz and

Martin (2004). In a sequence of design experiments on the teaching of descriptive statistics

with intellectually gifted students, Schwartz and Martin (2004) demonstrated an existence

proof for the hidden efficacy of invention activities when such activities preceded DI,

despite such activities failing to produce canonical conceptions and solutions during the

invention phase. However, the proponents of DI have criticized PFL and IPL studies

because of a lack of adequate control and experimental manipulation of one variable at a

time, which makes it difficult to make causal attributions of the effects (Kirschner et al.

2006).

Earlier experiments in PF (Kapur 2008) provide evidence from randomized-controlled

experiments for the role of failure in learning and problem by delaying structure. Kapur

(2008) examined students solving complex problems without the provision of any

external support structures or scaffolds. 11th-grade student triads from seven high

schools in India were randomly assigned to solve either ill- or well-structured physics

problems in an online, chat environment. Ill-structured groups generated a greater

diversity of representations and methods for solving the ill-structured problems. How-

ever, ill-structured group discussions were found to be more complex and divergent than

those of their well-structured counterparts, leading to poor group performance. After

group problem solving, all students individually solved well-structured test problems

followed by ill-structured test problems. Notwithstanding their poor group performance,

students from ill-structured groups outperformed those from well-structured groups in

individually solving both well- and ill-structured test problems subsequently. These

findings suggested a hidden efficacy in the complex, divergent interactional process even

though it seemingly led to failure in the ill-structured groups initially. Kapur argued that

delaying the structure in the ill-structured groups helped them discern how to structure

an ill-structured problem, thereby facilitating a spontaneous transfer of problem-solving

skills. Findings from this study have since been replicated (Kapur and Kinzer 2009).

These findings are consistent with other research programs that suggest that conditions

that maximize learning in the longer term are not necessarily the ones that maximize

performance initially (Clifford 1984; Schmidt and Bjork 1992). For example, Schmidt and

Bjork (1992) conceptualized the notion of ‘‘desirable difficulties’’ to argue that introducing

‘‘difficulties’’ during the learning phase, for example, by delaying feedback or increasing

task complexity, can enhance learning insofar as learners engage in processes (e.g.,

assembling different facts and concepts into a schema, generating and exploring the

affordances of multiple representations and methods) that are germane for learning. Col-

lectively, it is reasonable to reinterpret their central findings as all of them point to the

efficacy of learner-generated processing, conceptions, representations, and understandings,

even though such conceptions and understandings may not be correct initially and the

process of arriving at them not as efficient. The above findings, while preliminary,

underscore the implication that delaying instructional support—be it explanations, feed-

back, DI, or well-structured problems—in learning and problem-solving activities so as to

allow learners to generate solutions to novel problems can be a productive exercise in

failure (Kapur 2008; Kapur and Rummel 2009).

654 M. Kapur

123

Page 5: Productive failure in learning the concept of variance

Designing for PF

There are at least two problems with DI in the initial phase of learning something new or

solving a novel problem. First, students often do not have the necessary prior knowledge

differentiation to be able to discern and understand the affordances of the domain-specific

representations and methods underpinning the targeted concepts given during DI (e.g.,

Kapur and Bielaczyc 2012; Schwartz and Bransford 1998; Schwartz and Martin 2004).

Second, when concepts are presented in a well-assembled, structured manner during DI,

students may not understand why those concepts, together with their representations, and

methods, are assembled or structured in the way that they are (Chi et al. 1988; Schwartz

and Bransford 1998).

Cognizant of these two problems, PF engages students in a learning design that

embodies four core, interdependent mechanisms: (a) activation and differentiation of prior

knowledge in relation to the targeted concepts, (b) attention to critical conceptual features

of the targeted concepts, (c) explanation and elaboration of these features, and (d) orga-

nization and assembly of the critical conceptual features into the targeted concepts (for a

fuller explication of the design principles, see Kapur and Bielaczyc 2012). These mech-

anisms are embodied in a two phase design: a generation and exploration phase (Phase 1)

followed by a consolidation phase (Phase 2). Phase 1 affords opportunities for students to

generate and explore the affordances and constraints of multiple RSMs. Phase 2 affords

opportunities for organizing and assembling the relevant student-generated RSMs into

canonical RSMs.

The designs of both phases were guided by the following core design principles that

embody the abovementioned mechanisms:

(1) create problem-solving contexts that involve working on complex problems that

challenge but do not frustrate, rely on prior mathematical resources, and admit

multiple RSMs (mechanisms a and b);

(2) provide opportunities for explanation and elaboration (mechanisms b and c);

(3) provide opportunities to compare and contrast the affordances and constraints of

failed or sub-optimal RSMs and the assembly of canonical RSMs (mechanisms b–d).

In this paper, my goal is to compare the efficacy of a PF design with a DI design. In the

following section, I describe a study with ninth-grade mathematics students from a public

school in Singapore who experienced a PF or a DI design for learning the concept of

variance.

In the light of the mechanisms embodied in the PF design, I hypothesized that students

in the PF condition will be able to generate and explore various RSMs (Bielaczyc and

Kapur 2010; diSessa et al. 1991), but will not be successful in developing or discovering

the canonical formulation on their own (Kirschner et al. 2006), and in this sense, will fail.

However, the PF design will be better in engendering the necessary prior knowledge

differentiation (mechanism a), which may help them learn the canonical RSMs when

explained by the teacher during DI subsequently (Schwartz and Bransford 1998; Schwartz

and Martin 2004). Though not tested in this study, I conjectured that by comparing the

student-generated with the canonical RSMs, students may attend to the critical features of

the canonical RSMs when the teacher explains and elaborates upon them (mechanisms b

and c). This may result in better knowledge assembly (mechanism d), which in turn may

lead to better conceptual understanding and transfer performance. In terms of procedural

fluency in computing and interpreting variance, however, I did not expect any differences

Productive failure 655

123

Page 6: Productive failure in learning the concept of variance

between the PF and DI designs because the formulation of variance is relatively easy to

compute and interpret.

Method

Participants

133, 9th grade mathematics students (14–15 year olds) from an all-boys pubic school in

Singapore participated in this study. Students were almost all of Chinese ethnicity. Stu-

dents were from four intact mathematics classes; two classes taught by one teacher (teacher

A), and the other two class by another teacher (teacher B). The concept of variance is

typically taught in the 10th grade, and therefore, students had no instructional experience

with the targeted concept—variance—prior to the study, although they had learnt the

concepts of mean, median, and mode in grades 7 and 8.

Research design

A pre-post quasi-experimental design was used. For each teacher, one class was assigned to

the DI condition, and the other to the PF condition.

Pretest

As a measure of prior knowledge, all students took a five-item, paper-and-pencil pretest

(a = .75) on the pre-requisite concepts of central tendencies (2 items) and distributions (2

items), as well as on the targeted concept of variance (1 item) 1 week before the inter-

vention (see example items in Appendix B). Two experienced raters independently scored

students’ solutions with an inter-rater reliability of .98, and all disagreements were

resolved via discussion with the author.

Solutions were scored as incorrect (0 points), partially correct (1 or 2 points), or fully

correct (4 points). Partially correct solutions were those that demonstrated correct repre-

sentational and strategy deployment but with computational errors (2 points). Partially

correct solutions could also be incomplete solutions with correct representational and

strategy deployment (1 point). The overarching emphasis was on conceptual knowledge,

and computational errors were penalized only minimally (maximum 10 % of the maximum

possible score of 4), and penalties for computational errors were not carried forward. To

allow for ease of comparison, the score for each item was scaled (linearly) upon 10; pretest

score was calculated by averaging the item scores. There was no significant difference

between the two conditions on the pretest, F(1,124) = .468, p = .495, and not a single

student demonstrated canonical knowledge of the concept of variance.

Intervention

All students participated in four, 50-min periods of instruction on the concept as appro-

priate to their assigned condition.

In the PF condition students spent the first two periods working face-to-face in triads to

solve a data analysis problem on their own (see Appendix A). The data analysis problem

presented a distribution of goals scored each year by three soccer players over a 20 year

period. Students were asked to design a quantitative index to determine the most consistent

656 M. Kapur

123

Page 7: Productive failure in learning the concept of variance

player. During this generation phase, no instructional support or scaffolds were provided.

However, do note that the design of the PF tasks and activity structures embodied specific

design principles stated earlier. In the third period, the teacher first consolidated by

comparing and contrasting student-generated solutions with each other, and then modeled

and worked through the canonical solution as in the DI condition. In the fourth and final

period, students solved three data analysis problems for practice, and the teacher discussed

the solutions with the class.

In the DI condition, the teacher used the first period to explain the canonical formulation

of the concept of variance using two sets of ‘‘worked-example followed by problem-

solving’’ pairs. The data analysis problems required students to compare the variability in

2–3 given data sets, for example, comparing the variability in rainfall in two different

months of a year. After each worked example, students solved an isomorphic problem,

following which their errors, misconceptions, and critical features of the concept were

discussed with the class as a whole. To motivate students to pay attention and remain

engaged, they were told that they will be asked to solve isomorphic problems after the

teacher-led worked examples. In the second period, students were given three isomorphic

data analysis problems to solve, and the solutions were discussed by the teacher. In the

third period, students worked in triads to solve the same problem that the PF students

solved in the first two periods, following which the teacher discussed the solutions with the

class. DI students did not need two periods to solve the problem because they had already

learnt the concept. The DI cycle ended with a final set of three data analysis problems for

practice (the same problems given to the PF students), which the students solved indi-

vidually, and the teacher discussed the solutions with the class. Furthermore, after the

second and the fourth periods, DI students were given three isomorphic data analysis

problems for homework, that is, a total of six homework problems altogether. In com-

parison, not only did the PF students not receive any homework, they also solved fewer

data analysis problems overall than their counterparts in the DI condition because they

spent the first two periods generating an index for variance.

After the second and fourth periods, students from both the conditions took a five-item,

five-point (1 = low–5 = high) Likert scale engagement survey (a = .79). The survey

comprised items like ‘‘I participated in the lesson’s activities,’’ ‘‘I was attentive during the

lesson,’’ and so on. Table 1 summarizes the tasks and activities in the two conditions.

Table 1 Tasks and activities in the PF and DI conditions

Period PF DI

1 Students work in triads to generate RSMs to anovel, complex problem (see Appendix A)

Teacher explains concept, models and explainsthe canonical RSMs using two sets of‘‘worked-example followed by individualproblem-solving’’ pairs

2 Students continue their work in triads togenerate solutions to the novel, complexproblem

Students continue to work individually to solvethree more isomorphic problems, and theteacher goes through the solutions with thewhole class after each problem

3 Teacher consolidates; models and explains thecanonical RSMs for the novel problem;students work individually

Students work in triads to solve the sameproblem solved by PF students in the first twoperiods (see Appendix A), following whichteacher explains the solution

4 Students in both conditions work individually to solve three isomorphic problems for practice inclass, and the teacher discusses the solutions with the class as a whole

Productive failure 657

123

Page 8: Productive failure in learning the concept of variance

Process measures for the PF condition

Each PF group was given A4 sheets of blank paper for their group work. All group

discussions were captured in audio and transcribed by a research assistant. The group work

artifacts and the discussion transcripts were used to determine the maximal set of RSMs

generated by the PF groups using an analytical scheme developed in previous work on PF

(Kapur and Bielaczyc 2012).

The set of RSMs identified in the group work artifacts were used to chunk the group

discussion into smaller episodes. For example, if the group work artifacts revealed that the

group used graphs to solve the problem, then the relevant episode from the discussion

where the group discussed the graphical method was identified. Chunking of a discussion

into episodes was simplified by the fact that there were generally clear transitions in the

discussions when a group moved from one RSM (e.g., central tendencies) to another (e.g.,

graphing). Episodes containing additional RSMs not captured in the group work artifacts

were also identified. In accordance with the hypothesis, the analysis was focused squarely

on RSMs, and episodes of non-task behavior and social talk were not included in the

analysis. This process was repeated for all the PF groups. Two raters independently

chunked the group transcripts into episodes and coded the episodes into RSM type with

inter-rater reliabilities (Krippendorff’s alphas) of .97 and .92 respectively. As in previous

work (Kapur and Bielaczyc 2012), RSM diversity was taken to be a measure of knowledge

activation and differentiation, and was defined as the total number of different RSMs

generated by a group; the higher this number, the greater the RSM diversity.

Process measures for the DI condition

Student problem-solving worksheets from their classroom work were used to note the kinds

of RSMs (canonical or non-canonical) students used to solve the isomorphic problems.

Performance from the homework assignments provided a proxy measure for student per-

formance in the DI condition. The homework problems were scored by the teacher as either a

1 (if answered correctly) or 0 (if answered incorrectly). Computational or calculation errors

were not penalized given the focus on conceptual understanding. The average percentage

score for the homework problems was taken as a measure of DI student performance.

Posttest

On the day immediately after the intervention, all students took a six-item, paper-and-

pencil posttest (a = .78). The six-item posttest comprised three types of items (see

example items in Appendix C):

i. three items on procedural fluency,

ii. two items on conceptual understanding, and

iii. one item on transfer (requiring the development of a normalized score for comparing

incommensurable distributions even though this was not taught during instruction.

Therefore, what this item required the students to do was to flexibly adapt and build

upon the concepts they had learnt during instruction—deviation from the mean and

SD—and assemble them as a ratio).

A focus on procedural fluency, conceptual understanding, and transfer is consistent with

the argument that, minimally, a learning design should not only help students become

fluent in computing and interpreting the canonical RSMs (procedural fluency), but also

658 M. Kapur

123

Page 9: Productive failure in learning the concept of variance

understand why the RSMs are formulated the way they are (conceptual understanding), as

well as flexibly adapt them to solve problems that were not targeted during instruction

(transfer).

Two raters independently scored the items with an inter-rater reliability of .96, and all

disagreements were resolved via discussion with the author. As was the case with the

pretest, solutions were scored as incorrect (0 points), partially correct (1 or 2 points), or

fully correct (4 points). Partially correct solutions were those that demonstrated correct

representational and strategy deployment but with computational errors (2 points). Partially

correct solutions could also be incomplete solutions with correct representational and

strategy deployment (1 point). The overarching emphasis was on conceptual knowledge,

and computational errors were penalized only minimally (maximum 10 % of the maximum

score on an item, i.e., 4), and penalties for computational errors were not carried forward.

The score for each item was scaled (linearly) upon 10; scores for each of the three types of

items were obtained by averaging the constituent item scores. Performance on the three

types of items formed the three dependent variables.

For the purposes of this paper, only the hypotheses relating to the learning outcomes

(procedural fluency, conceptual understanding, and transfer) were tested along with indi-

rect evidence for the mechanism of prior knowledge activation and differentiation by

looking at the relationship between RSM diversity and learning outcomes.

Results

Process results

PF groups’ RSM diversity

PF groups generated on average seven RSMs, M = 7.18, SD = 2.08. To alleviate a

concern that the number of RSMs generated by groups may not adequately reflect RSM

diversity, a bottom-up categorization procedure was used to examine the full set of student-

generated RSMs based on the mathematical approaches and concepts they deployed,

resulting in the following four major categories: (a) central tendencies, (b) qualitative/

graphing methods, (c) frequency/counting methods, and (d) deviation methods.

Category 1: central tendencies

Groups started by using mean, median, and in some cases, mode for data analysis. This was

not surprising because students had been taught these concepts in the earlier grades.

However, relying on central tendencies alone, it was not possible to generate a quantitative

index for variance because the problem was designed in a way to keep the central ten-

dencies invariant.

Category 2: qualitative/graphing methods

Groups generated graphical and tabular representations that organized the data visually and

were able to discern which player was more consistent. The visual representations (see

Fig. 1) afforded a qualitative comparative analysis between the players, but did not provide

a quantitative index for consistency even though the ideas of spread and clustering are

important qualitative conceptual underpinnings for the concept of variance.

Productive failure 659

123

Page 10: Productive failure in learning the concept of variance

Category 3: frequency/counting methods

Groups built on the qualitative methods to develop frequency-based measures of con-

sistency. For example in Fig. 2, groups used the frequency of goals scored within certain

intervals to argue that the player with the highest number of goals in the interval

containing the mean was the most consistent. Other groups counted the frequency with

which a player scored above, below, and at the mean. Frequency methods demonstrated

that students could quantify the clustering trends that the qualitative representations

revealed.

Trend lines

Cumulative trend line

Frequency table

Dot diagrams and frequency polygons

Box plot

Fig. 1 Examples of qualitative representations and methods

660 M. Kapur

123

Page 11: Productive failure in learning the concept of variance

Category 4: deviation methods

Figure 3 presents some examples of the deviation methods. The simplest deviation method

generated was the range (Deviation method 1, or simply D1). Some groups calculated the

sum of year-on-year deviations (D2) to argue that the greater the sum, the lower the

consistency. Among these, there were those who considered absolute deviations (D3) to

avoid deviations of opposite signs cancelling each other—an important conceptual leap

towards understanding variance. Finally, there were some groups who calculated devia-

tions about the mean (D4) only to find that they sum to zero. For both the D3 and D4

categories, some groups further refined their method to consider not the sum of the

deviations, but the average (D5).

These four categories of student-generated RSMs suggested that not only were students’

priors activated (central tendencies, graphing, differences, etc.) but that students were able

to assemble them into different ways of measuring consistency. Categorization analysis

found that all groups generated RSMs in each of the Categories 1–3 (each with mode = 2)

but not all reached Category 4 (mode = 1). Thus, even though RSM diversity was oper-

ationalized as the total number of different RSMs generated by a group, categorization

Frequency of years within selected intervals

Frequency of years above, below, and at average

Fig. 2 Examples of frequency representations and methods

Productive failure 661

123

Page 12: Productive failure in learning the concept of variance

analysis adds another layer of description showing that the RSMs did not just come from

one or two categories but were generally spread across the categories. Consistent with the

hypothesis, none of the groups were able to develop let alone use the canonical formulation

on their own. More importantly, these student-generated RSMs evidence the hypothesis

that students will in fact be able to generate a diversity of RSMs to solve the problem

without having first learnt the targeted concept of variance.

Process measures for the DI condition

Students’ classroom work revealed that all students relied only on the canonical formu-

lation to solve data analysis problems. This was not surprising given that the canonical

formulation is relatively easy to compute and apply. Furthermore, all DI students, working

in triads, were able to successfully solve the data analysis problem that the PF students

solved during the generation phase—a success rate of 100 % as compared to 0 % in the PF

condition where none of the triads was able to develop the canonical solution. This per-

formance success was corroborated further with performance on homework problems. The

Average of year-on-year absolute deviation (D3, D5)

Sum of deviations about the mean (D4) Sum of year-on-year deviation (D2)

Range (D1)

Fig. 3 Examples of deviation-based representations and methods

662 M. Kapur

123

Page 13: Productive failure in learning the concept of variance

average performance (i.e., percentage of problems solved correctly) on the homework

problems was high, M = 93.2 %, SD = 5.3 %.

Engagement ratings for the PF and DI conditions

Finally, Table 2 shows that there was generally a high level of engagement in all class-

rooms. However, there was no significant difference between the conditions on the self-

reported engagement ratings.

To sum up, these process findings serve as a manipulation check demonstrating that

students in the PF condition experienced ‘‘failure’’ at least in the conventional sense of not

being able to develop the canonical solution on their own, whereas DI students demon-

strated successful application of the canonical formulation to solve several data analysis

problems, including the one that the PF students solved during the generation phase.

Furthermore, the engagement ratings also suggest that DI students were not only experi-

encing performance success but that they were reportedly just as engaged as PF students in

their lessons.

Outcome results

Taking the three types of posttest items as the dependent variables, a MANCOVA1 was

carried out with condition (PF vs. DI) as the between-subjects factor, and prior knowledge

as the covariate. Controlling for the multivariate effect of prior knowledge, F(3, 128) =

2.29, p = .082, a MANCOVA revealed a statistically significant multivariate effect of

condition (PF vs. DI) on posttest scores, F(3, 128) = 37.42, p \ .001, partial g2 = .47.

Interaction between prior knowledge and experimental condition was not significant.

Univariate ANCOVAs in Table 3 show that PF students significantly outperformed DI

students on conceptual understanding and transfer without compromising procedural

fluency.

Variation within the PF condition

Taking the number of different RSMs—RSM diversity—generated by PF groups as an

indirect measure of prior knowledge activation and differentiation, I examined whether the

number of RSMs generated by each group relates to the subsequent posttest performance

Table 2 Summary of self-reported engagement ratings in PF and DI conditions

n M SD F p

Engagement rating 1 (after the 2nd period) PF 74 3.84 .51 F(1, 137) = .047 .828

DI 65 3.83 .44

Engagement rating 2 (after the 4th period) PF 74 3.70 .63 F(1, 137) = .516 .474

DI 65 3.62 .67

Engagement ratings were reported on five-item, five-point (1–5) Likert scale survey; the higher the score thebetter the engagement

1 Initially, a condition (PF vs. DI) by teacher (teacher A vs. teacher B) MANCOVA was carried out withprior knowledge as covariate. However, the main and interaction effects of teacher were not significant.Hence, the teacher (or class) factor was collapsed, and a more straightforward MANCOVA with conditionas the sole between-subjects factor has been reported.

Productive failure 663

123

Page 14: Productive failure in learning the concept of variance

by members of that group. Data for this analysis comes from the two intact PF classes

because the DI classes did not generate any RSMs prior to instruction.

Taking the three types of posttest items as the dependent variables, a MANCOVA was

carried out with class as the between-subjects factor (to parse out variance due to class),

and prior knowledge and the number of RSMs generated as the two covariates (in that

order). The multivariate effects of class, F(3, 65) = 1.897, p = .139, and prior knowledge,

F(3, 65) = 1.839, p = .149, were both not significant. However, there was a significant

multivariate effect of the number of student-generated RSMs on the three types of posttest

items, F(3, 65) = 6.715, p = .001, partial g2 = .24. Univariate analysis revealed that the

number of student-generated RSMs had a significant and positive impact on:

i. procedural fluency, F(1, 67) = 4.292, p = .042, partial g2 = .06;

ii. conceptual understanding, F(1, 67) = 16.146, p \ .001, partial g2 = .19; and

iii. transfer, F(1, 67) = 10.017, p = .002, partial g2 = .13.

Note that the findings did not change when the analysis was done at the group level, that

is, when relating average of the group members’ performance on the posttest with the

number of solutions they generated as a group, while controlling for class and their average

performance on the pretest.

Discussion

This study compared the effectiveness of a PF with a DI design for learning the concept of

variance. Findings suggested that PF students significantly outperformed their DI coun-

terparts on conceptual understanding and transfer without compromising procedural flu-

ency. Further analyses revealed that the RSM diversity was a significant predictor of

procedural fluency, conceptual understanding, and transfer.

These findings are consistent with previous studies on PF with other mathematical

topics and profile of students (Kapur 2008, 2009, 2010; Kapur and Bielaczyc 2011), and

also with other studies described earlier in the paper (e.g., Schwartz and Bransford 1998;

Schwartz and Martin 2004). More broadly, findings are consistent with Schmidt and

Bjork’s (1992) review of psychological science research on motor and verbal learning.

They argued that under certain conditions, introducing ‘‘difficulties’’ during the training

phase, for example, by delaying feedback or increasing task complexity, can enhance

learning insofar as learners engage in processes (e.g., assembling different facts and

concepts into a schema, generating and exploring the affordances of multiple representa-

tions and methods) that are germane for learning.

Table 3 Summary of posttest performance

Item type PF DI F(1,130) p g2

Ma SD Ma SD

Procedural fluency 7.02 1.05 7.07 1.69 – ns –

Conceptual understanding 8.76 2.40 4.37 2.43 102.59 \.001 .44

Transfer 5.88 2.32 3.25 2.31 39.21 \.001 .23

a covariate-adjusted means

664 M. Kapur

123

Page 15: Productive failure in learning the concept of variance

Explaining PF

As hypothesized, the PF design invoked learning processes that activated and differentiated

students’ prior knowledge as evidenced by the diversity of student-generated RSMs. After

all, students could only rely on their priors—formal and intuitive—in generating these

RSMs. Therefore, the more they could generate, the more it can be argued that they were

able to conceptualize the targeted concept in different ways, that is, their priors were not

only activated but also differentiated in the process of generation. In other words, the

generated RSMs can be seen as a measure, albeit indirect, of knowledge activation and

differentiation; the greater the number of such RSMs, the greater the knowledge activation

and differentiation.

The importance of RSM generation was further evidenced by the finding that RSM

diversity was a significant predictor of learning outcomes on the posttest. In other words,

not only are students able to generate RSMs, but the more they generate, the more they

seem to learn from PF. This of course opens the door to the argument that if exposure to

student-generated RSMs is what is essential, then instead of getting students to generate

RSMs, why not simply let students study the student-generated solutions first (e.g., in the

form of well-designed worked examples) and then give them the canonical RSMs through

DI? Simply put, is it really necessary for students to generate the RSMs or can these be

given to them? I have begun to explore this question, and initial experiments suggest that

generation of RSMs is still more effective than studying and evaluating them (Kapur and

Bielaczyc 2011), which is also consistent with Roll’s (2009) work in this area.

In terms of the affordances of the design, note that whereas PF students had the

opportunity to work with not only the RSMs that they generated but also the canonical

RSMs that they received during DI, DI students worked with only the canonical ones.

Hence, DI students worked with a smaller diversity of RSMs, and consequently, their

knowledge was arguably not as differentiated as their PF counterparts. Granted that using

RSMs is at best an indirect measure of prior knowledge activation and differentiation, this

was still a critical difference between the two conditions by design.

Limitations

Being a classroom-based, quasi-experimental study, there are inherent limitations in causal

attribution of effects to design elements. While both conditions targeted the same concepts

and had the same teacher, they differed in terms of the timing of the DI on the canonical

concept, the number of worked examples and practice problems (which the DI condition

had more of), the inclusion of a consolidation activity in the PF condition wherein the

teacher compared the student-generated RSMs, and the proportion of time spent individ-

ually and in group work. Therefore, strict causal attribution remains necessarily tentative.

Furthermore, the scope of inference is limited by the participant school being an all-boys

school.

Finally, one could also argue that perhaps the DI condition was simply a case of poor

instruction, which may have resulted in students not paying attention to the worked

examples and teacher explanations. To mitigate this concern one must note that: (a) after

learning and applying the concept in the first two periods, all DI students could success-

fully solve the complex problem using the canonical formulation; the very problem that

none of the PF students could develop a canonical formulation for, (b) performance on

homework assignments was high, and (c) there was no significant difference between the

two conditions on the self-reported engagement ratings. Therefore, taking data on student

Productive failure 665

123

Page 16: Productive failure in learning the concept of variance

engagement together with the homework and collaborative problem-solving helps mitigate

the concern that the DI condition might simply have been a case of poor instruction.

Implications

Findings from this study, though tentative, point to a move away from simplistic com-

parisons between discovery learning and DI, and in doing so, contribute to the ongoing

debate comparing the effectiveness of DI with discovery learning approaches (e.g., Tobias

and Duffy 2010); proponents of DI continue to epitomize discovery learning as the con-

structivist ideal (e.g., Kirschner et al. 2006; Klahr and Nigam 2004).

It is perhaps worth clarifying that a commitment to a constructivist epistemology does

not necessarily imply a commitment to discovery learning. Simply leaving learners to

generate and explore without consolidating is unlikely to lead to learning, or at least

learners cannot be expected to ‘‘discover’’ the canonical representations by themselves as

indeed my findings suggest. Instead, a commitment to a constructivist epistemology

requires that we build upon learners’ prior knowledge. However, one cannot build upon

prior knowledge if one does not know what this prior knowledge is in the first place.

Importantly, this emphasis on prior knowledge is critical even from the perspective of

cognitive load theory (CLT). As indeed Kirschner et al. (2006) argued, ‘‘Any instructional

theory that ignores the limits of WM when dealing with novel information or ignores the

disappearance of those limits when dealing with familiar information is unlikely to be

effective’’ (p. 77).

If what a learner already knows—prior knowledge—about a concept is a critical

determinant of either limiting or expanding the WM capacity as conceptualized by CLT,

then does not a commitment to CLT entail a commitment to understanding whether and to

what extent the targeted concept is novel to the learner? If one defines novelty via the

canonical lens, then one is constrained to work within the limiting aspects of the WM,

which is what the proponents of DI largely seem to have done (e.g., Carroll 1994; Sweller

and Cooper 1985; Paas 1992).

However, if we allow for the concomitant possibility that learners may have some prior

knowledge and resources about a concept they have yet to learn, could we not design tasks

and activity structures to elicit this knowledge, and by activating and working with these

priors in the long-term memory, leverage the expandable aspects of WM capacity? At the

very least, this is a theoretical possibility that the CLT allows for. Better still, evidence

from CLT researchers in fact supports this contention. For example, when students were

asked to learn to use a database program, Tuovinen and Sweller (1999) found that students

who had prior domain familiarity with databases learnt as much from exploration practice

(based on discovery learning principles) as they did from worked-examples practice.

However, for those with low or no domain familiarity, worked examples practice was

superior. They argued that high domain familiarity helped students draw on prior

knowledge of the domain to guide their exploration thereby reducing cognitive load.

Therefore, whether one makes commitment to the information processing perspective

undergirding the CLT or to the constructivist perspective, the primacy of what a learners

already knows—formally or intuitively—seems to be common and important to both. It

follows that at the very least the burden on the designer is to first understand the nature of

learners’ prior knowledge structures; the very structures upon which the claimed ‘‘build-

ing’’ will be done. Designing for PF presents one way of doing so, wherein students first

generate and explore representations and methods, and in the process externalize their prior

knowledge structures, before DI.

666 M. Kapur

123

Page 17: Productive failure in learning the concept of variance

On this note, it would be useful to revisit Klahr and Nigam’s (2004) study cited earlier in

the paper. One could argue that their study supports the above contention although it is often

cited as a stellar example of the superior effectiveness of DI over discovery learning. A careful

reading of the study suggests that before assigning students to either a DI or a discovery

learning condition, Klahr and Nigam conducted a baseline assessment where they asked

students to design four experiments on their own to see if students knew the control of

variables strategy (CVS) principle for designing un-confounded experiments. As expected,

only 8 out of the 112 students were able to design four un-confounded experiments, that is, the

success rates before any instruction on CVS were very low. Students who were subsequently

assigned to the discovery learning condition simply continued to design these experiments

but without any instruction on CVS or any feedback. However, for students in the DI con-

dition, the instructor modeled and contrasted the design of both confounded and un-con-

founded experiments with appropriate instructional facilitation and explanation to make them

attend to critical features of why CVS helps isolate the effects of a factor whereas confounded

experiments do not. It was not surprising therefore that Klahr and Nigam found DI to be more

effective than discovery learning as described earlier in this paper.

From the perspective of PF however, the baseline assessment in Klahr and Nigam’s (2004)

study seems to function very much like the generation and exploration phase where students

generate their own structures (in this case, experiments) to solve a problem that targets a

concept (in this case, CVS) that they had not formally or canonically learnt yet. Indeed, Klahr

and Nigam (2004) themselves termed it the ‘‘exploration phase.’’ If so, the very effects that

Klahr and Nigam attribute to DI alone seem more appropriately attributed to a generation and

exploration phase (their baseline assessment) followed by DI. Therefore, much as Klahr and

Nigam set out to show that there is little efficacy in students exploring and solving problems

requiring concepts they have not learnt yet, their findings can be reinterpreted to support

precisely the opposing contention that such exploration can in fact be efficacious provided

some form of DI follows, for without it, students may not learn much (as indeed the per-

formance of the students in the discovery learning condition revealed).

Conclusion

Contrary to the commonly held belief that there is little efficacy in having learners solve

novel problems that target concepts they have not learnt yet, this study suggests that there

is indeed such an efficacy even if learners do not formally know the underlying concepts

needed to solve the problems, and even if such problem solving leads to failure initially. Of

course, this failure is productive only if learners are engaged in processes germane for

learning. Thus argued, designing for a certain level of failure (as opposed to minimizing it)

in the initial learning phase may well be productive for learning in the longer run provided

that an appropriate form of instruction is given subsequently. Future research would do

well not to (over)simplistically compare discovery learning with DI, but instead understand

conditions under which these approaches can complement each other productively.

Appendix A: The complex problem scenario

Mr. Fergusson, Mr. Merino, and Mr. Eriksson are the mangers of the Supreme Football

Club. They are on the lookout for a new striker, and after a long search, they short-listed

three potential players: Mike Arwen, Dave Backhand, and Ivan Right. All strikers asked for

Productive failure 667

123

Page 18: Productive failure in learning the concept of variance

the same salary, so the managers agreed that they should base their decisions on the

players’ performance in the Premier League for the last 20 years. Table 4 shows the

number of goals that each striker had scored between 1988 and 2007.

The managers agreed that the player they hire should be a consistent performer. They

decided that they should approach this decision mathematically, and would want a formula

for calculating the consistency of performance for each player. This formula should apply

to all players and help provide a fair comparison. The managers decided to get your help.

Please come up with a formula for consistency and show which player is the most

consistent striker. Show all working and calculations on the paper provided.

Appendix B: Examples of pretest items

Central tendencies

The table below shows the timing (in minutes) for a 2.4 km run for 40 students in Class

2E1. Calculate the mean, median and mode of the timing of Class 2E1.

11; 11; 12; 12; 12; 12; 13; 13; 13; 13; 13; 14; 14; 14; 14; 14; 14; 15; 15; 15; 15; 15;

15; 15; 15; 15; 15; 15; 16; 16; 16; 16; 16; 16; 16; 16; 17; 17; 17; 17

Table 4 Number of goals scoredby three strikers in the premierleague

Year Mike Arwen Dave Backhand Ivan Right

1988 14 13 13

1989 9 9 18

1990 14 16 15

1991 10 14 10

1992 15 10 16

1993 11 11 10

1994 15 13 17

1995 11 14 10

1996 16 15 12

1997 12 19 14

1998 16 14 19

1999 12 12 14

2000 17 15 18

2001 13 14 9

2002 17 17 10

2003 13 13 18

2004 18 14 11

2005 14 18 10

2006 19 14 18

2007 14 15 18

668 M. Kapur

123

Page 19: Productive failure in learning the concept of variance

Distributions

The heart rate per minute of a group of 20 adults is displayed in the dot diagram below. For

example, 3 adults have a rate of 60 beats per minute. Based on this data set, how many

individuals from a similar group of 40 adults would be expected to have a heart rate of 90beats or more per minute?

Dot diagram for the heart rate per minute for a group of 20 adults

Variance

The owners of two cinemas, A and B, argue that their respective cinema enjoys a more

consistent attendance. They collected the daily attendance of their cinemas for 11 days.

The results of their data collection are shown below.

Cinema A Cinema B

Day 1 69 61

Day 2 70 65

Day 3 75 91

Day 4 52 55

Day 5 57 58

Day 6 92 95

Day 7 71 67

Day 8 73 81

Day 9 74 89

Day 10 72 70

Day 11 87 93

Based on the above attendance data and statistics, which cinema do you think enjoys a

more consistent attendance? Please explain mathematically and show your working.

Appendix C: Examples of posttest items

Procedural fluency item 1

Q1. Marks scored by 10 students on a test on statistics are shown below. As a measure of

the variance, calculate the standard deviation of the test scores.

30; 50; 50; 55; 60; 60; 60; 70; 80; 90

Productive failure 669

123

Page 20: Productive failure in learning the concept of variance

Conceptual understanding item 1

Q2. For Q1, one student came up with another measure of variance by taking theaverage of the sum of the difference between adjacent scores as shown below:50�30ð Þþ 50�50ð Þþ 55�50ð Þþ 60�55ð Þþ 60�60ð Þþ 60�60ð Þþ 70�60ð Þþ 80�70ð Þþ 90�80ð Þ

10�1¼ 6:67

How does the student’s measure of variance compare with the standard deviation as a

measure of variance? Which one is better? Please explain your answer.

Procedural fluency item 2

In preparing for the Youth Olympics in 2010, the Ministry of Community, Youth and

Sports had to decide the month in which to hold the games. They narrowed their options to

July and August, and decided to examine rainfall data for ten randomly selected days in

July and August in 2007 to make a choice. The amounts of rainfall (in millimeters) for the

2 months are shown below.

Day Rainfall in July (mm) Rainfall in August (mm)

Week 1, Day 1 32 25

Week 1, Day 3 35 31

Week 2, Day 2 35 35

Week 2, Day 4 37 37

Week 2, Day 7 37 37

Week 3, Day 2 37 37

Week 3, Day 5 38 38

Week 3, Day 7 39 39

Week 4, Day 5 40 42

Week 4, Day 6 40 49

i. Based on the information, which month should the Ministry choose, given that they

would want a month that has a consistently low amount of rainfall?

Conceptual Understanding Item 2

ii. A few days later, the Ministry re-looked at the data and realized that they made a

mistake for the figure recorded Week 4, Day 6 in July. Instead for 40 mm, the rainfall

should be 60 mm. Given this new figure, which month should the Ministry choose

now, if they want one that has a consistently low amount of rainfall?

Transfer item

Two Secondary Four students were nominated for the ‘‘Best Science Student’’ award for

2009. Muthu Kumaran is the top Physics student, while Alicia Kuan is the top Chemistry

student for 2009. The table below shows the Physics and Chemistry top scorers between

1998 and 2009, with their scores presented in ascending order.

670 M. Kapur

123

Page 21: Productive failure in learning the concept of variance

Top physics scorers for the past 12 years Top chemistry scorers for the past 12 years

Name Year Score Name Year Score

Yap Pei Ling 2006 81 Lim Jen Yi 1998 80

Cho Ying Ming 1999 83 Charissa Tan 2001 81

Bala Ayanan 2001 83 Allan Wu 2000 83

Mohammad Azhar 2000 84 Ali Salim 2002 85

Matilda Tay 2002 84 Derick Chan 1999 89

Louis Ho 2005 85 David Tan 2003 90

Tham Jing Ling 2004 85 Abdul Basher 2005 90

Jodie Ang 1998 85 Fredrick Chay 2004 94

Jeremy Goh 2003 85 Linda Siew 2006 95

Chee Haw Ren 2006 85 Terry Lee 2008 96

Susan Teo 2005 86 Low Ming Lee 2007 98

Muthu Kumaran 2009 94 Alicia Kwan 2009 99

Mean 85 Mean 90

Both Muthu and Alicia are the best performers in their respective subjects for the past

12 years. Because there is only one ‘‘Best Science Student’’ award, who do you think

deserves the award more? Please explain your decision mathematically and show your

working.

References

Bielaczyc, K., & Kapur, M. (2010). Playing epistemic games in science and mathematics classrooms.Educational Technology, 50(5), 19–25.

Brown, A., & Campione, J. (1994). Guided discovery in a community of learners. In K. McGilly (Ed.),Classroom lessons: Integrating cognitive theory and classroom practice (pp. 229–270). Cambridge:MIT Press.

Carroll, W. (1994). Using worked examples as an instructional support in the algebra classroom. Journal ofEducational Psychology, 86, 360–367.

Chi, M. T. H., Glaser, R., & Farr, M. J. (1988). The nature of expertise. Hillsdale: Erlbaum.Clifford, M. M. (1984). Thoughts on a theory of constructive failure. Educational Psychologist, 19(2),

108–120.Cooper, G., & Sweller, J. (1987). The effects of schema acquisition and rule automation on mathematical

problem-solving transfer. Journal of Educational Psychology, 79, 347–362.diSessa, A. A., Hammer, D., Sherin, B. L., & Kolpakowski, T. (1991). Inventing graphing: Meta-repre-

sentational expertise in children. Journal of Mathematical Behavior, 10(2), 117–160.Hardiman, P., Pollatsek, A., & Weil, A. (1986). Learning to understand the balance beam. Cognition and

Instruction, 3, 1–30.Kapur, M. (2008). Productive failure. Cognition and Instruction, 26(3), 379–424.Kapur, M. (2009). Productive failure in mathematical problem solving. Instructional Science, 38(6),

523–550. doi:10.1007/s11251-009-9093-x.Kapur, M. (2010). A further study of productive failure in mathematical problem solving: Unpacking the

design components. Instructional Science, 39(4), 561–579. doi:10.1007/s11251-010-9144-3.Kapur, M., & Bielaczyc, K. (2011). Classroom-based experiments in productive failure. In L. Carlson,

C. Holscher, & T. Shipley (Eds.), Proceedings of the 33rd annual conference of the cognitive sciencesociety (pp. 2812–2817). Austin: Cognitive Science Society.

Kapur, M., & Bielaczyc, K. (2012). Designing for productive failure. The Journal of the Learning Sciences,21(1), 45–83.

Productive failure 671

123

Page 22: Productive failure in learning the concept of variance

Kapur, M., & Kinzer, C. (2009). Productive failure in CSCL groups. International Journal of Computer-Supported Collaborative Learning (ijCSCL), 4(1), 21–46.

Kapur, M., & Rummel, N. (2009). The assistance dilemma in CSCL. In A. Dimitracopoulou, C. O’Malley,D. Suthers, & P. Reimann (Eds.), Computer supported collaborative learning practices-CSCL2009community events proceedings, Vol. 2 (pp. 37–42). International Society of the Learning Sciences.

Kirschner, P. A., Sweller, J., & Clark, R. E. (2006). Why minimal guidance during instruction does notwork. Educational Psychologist, 41(2), 75–86.

Klahr, D., & Nigam, M. (2004). The equivalence of learning paths in early science instruction: Effects ofdirect instruction and discovery learning. Psychological Science, 15(10), 661–667.

Mathan, S., & Koedinger, K. (2003). Recasting the feedback debate: Benefits of tutoring error detection andcorrection skills. In U. Hoppe, F. Verdejo, & J. Kay (Eds.), Artificial intelligence in education: Shapingthe future of education through intelligent technologies (pp. 13–20). Amsterdam: IOS Press.

Paas, F. (1992). Training strategies for attaining transfer of problem-solving skill in statistics: A cognitive-load approach. Journal of Educational Psychology, 84, 429–434.

Paas, F., & van Merrienboer, J. (1994). Variability of worked examples and transfer of geometrical problemsolving skills: A cognitive-load approach. Journal of Educational Psychology, 86, 122–133.

Roll, I. (2009). Structured invention activities to prepare students for future learning: Means, mechanisms,and cognitive processes. Pittsburgh: Thesis.

Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in threeparadigms suggest new concepts for training. Psychological Science, 3(4), 207–217.

Schwartz, D. L., & Bransford, J. D. (1998). A time for telling. Cognition and Instruction, 16(4), 475–522.Schwartz, D. L., & Martin, T. (2004). Inventing to prepare for future learning: The hidden efficiency of

encouraging original student production in statistics instruction. Cognition and Instruction, 22(2),129–184.

Strand-Cary, M., & Klahr, D. (2008). Developing elementary science skills: Instructional effectiveness andpath independence. Cognitive Development, 23(4), 488–511.

Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12,257–285.

Sweller, J. (2010). What human cognitive architecture tells us about constructivism. In S. Tobias &T. M. Duffy (Eds.), Constructivist instruction: Success or failure (pp. 127–143). New York: Routledge.

Sweller, J., & Cooper, G. A. (1985). The use of worked examples as a substitute for problem solving inlearning algebra. Cognition and Instruction, 2, 59–89.

Tobias, S., & Duffy, T. M. (2010). Constructivist instruction: Success or failure. New York: Routledge.Trafton, J. G., & Reiser, R. J. (1993). The contribution of studying examples and solving problems to skill

acquisition. In M. Polson (Ed.), Proceedings of the 15th annual conference of the cognitive sciencesociety (pp. 1017–1022). Hillsdale: Erlbaum.

Tuovinen, J. E., & Sweller, J. (1999). A comparison of cognitive load associated with discovery learning andworked examples. Journal of Educational Psychology, 91, 334–341.

Van Lehn, K., Siler, S., Murray, C., Yamauchi, T., & Baggett, W. B. (2003). Why do only some eventscause learning during human tutoring? Cognition and Instruction, 21(3), 209–249.

672 M. Kapur

123