STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific...

31
Scientific Discovery 1 STUDIES OF SCIENTIFIC DISCOVERY: COMPLEMENTARY APPROACHES AND CONVERGENT FINDINGS See: Psychological Bulletin, 125 (5), 524-543 David Klahr & Herbert A. Simon Department of Psychology Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 (412) 268-3670 [email protected] Abstract This review integrates four major approaches to the study of science -- historical accounts of scientific discoveries, psychological experiments with non-scientists working on tasks related to scientific discoveries, direct observation of ongoing scientific laboratories, and computational modeling of scientific discovery processes -- by viewing them through the lens of the theory of human problem solving. We provide a brief justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting them. Then, we apply these criteria to the different approaches, and indicate their complementarities. Finally, we provide several examples of convergent principles of the process of scientific discovery. The central thesis of this paper is that, although research on scientific discovery has taken many different paths, they show remarkable convergence upon key aspects of the discovery processes, allowing us to aspire to a general theory of scientific discovery. This convergence is often obscured by the disparate cultures, research methodologies, and theoretical foundations of the various disciplines that study scientific discovery, including history and sociology as well as those within the cognitive sciences (e.g., psychology, philosophy, and artificial intelligence). Despite these disciplinary differences, common concepts and terminology can express the central ideas and findings about scientific discovery from the various disciplines, treating discovery as a particular species of human problem-solving. Moreover, we may be able to use these concepts and this vocabulary over an even broader domain, to converge toward a common account of discovery in many areas of human endeavor: practical, scientific and artistic, occurring both in everyday life and in specialized technical and professional domains. The doing of science has long attracted the attention of philosophers, historians, anthropologists, and sociologists. More recently, psychologists also have begun to turn their attention to the phenomena of scientific thinking, and there is now a large and rapidly growing literature on the psychology of science. (A good description of the field in its infancy can be found in Tweney, Doherty & Mynatt, 1981, and a recent summary of topics and findings from investigations of the developmental, personality, cognitive and social psychology of science can be found in Feist & Gorman, 1998). Our review links four major approaches to the study of science -- historical accounts of scientific discoveries, laboratory experiments with non-scientists working on tasks related to scientific discoveries, direct observation of ongoing scientific laboratories, and computational modeling of scientific discovery processes -- by viewing them through the lens of the theory of human problem solving. First we provide a brief justification for the study of scientific discovery. Then we summarize the major approaches and provide criteria for comparing and contrasting them. Next, we apply these criteria to the different approaches, and indicate their complementarities. Finally, we provide several examples of convergent principles of the process of scientific discovery.

Transcript of STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific...

Page 1: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 1

STUDIES OF SCIENTIFIC DISCOVERY:COMPLEMENTARY APPROACHES AND CONVERGENT FINDINGS

See: Psychological Bulletin, 125 (5), 524-543

David Klahr & Herbert A. Simon

Department of PsychologyCarnegie Mellon University

5000 Forbes AvenuePittsburgh, PA 15213

(412) [email protected]

AbstractThis review integrates four major approaches to the study of science -- historical accounts of scientificdiscoveries, psychological experiments with non-scientists working on tasks related to scientific discoveries,direct observation of ongoing scientific laboratories, and computational modeling of scientific discoveryprocesses -- by viewing them through the lens of the theory of human problem solving. We provide a briefjustification for the study of scientific discovery, a summary of the major approaches, and criteria forcomparing and contrasting them. Then, we apply these criteria to the different approaches, and indicate theircomplementarities. Finally, we provide several examples of convergent principles of the process of scientificdiscovery.

The central thesis of this paper is that, althoughresearch on scientific discovery has taken manydifferent paths, they show remarkableconvergence upon key aspects of the discoveryprocesses, allowing us to aspire to a generaltheory of scientific discovery. This convergenceis often obscured by the disparate cultures,research methodologies, and theoreticalfoundations of the various disciplines that studyscientific discovery, including history andsociology as well as those within the cognitivesciences (e.g., psychology, philosophy, andartificial intelligence).

Despite these disciplinary differences,common concepts and terminology can expressthe central ideas and findings about scientificdiscovery from the various disciplines, treatingdiscovery as a particular species of humanproblem-solving. Moreover, we may be able touse these concepts and this vocabulary over aneven broader domain, to converge toward acommon account of discovery in many areas ofhuman endeavor: practical, scientific andartistic, occurring both in everyday life and inspecialized technical and professional domains.

The doing of science has long attracted theattention of philosophers, historians,anthropologists, and sociologists. More

recently, psychologists also have begun to turntheir attention to the phenomena of scientificthinking, and there is now a large and rapidlygrowing literature on the psychology of science.(A good description of the field in its infancycan be found in Tweney, Doherty & Mynatt,1981, and a recent summary of topics andfindings from investigations of thedevelopmental, personality, cognitive and socialpsychology of science can be found in Feist &Gorman, 1998).

Our review links four major approaches tothe study of science -- historical accounts ofscientific discoveries, laboratory experimentswith non-scientists working on tasks related toscientific discoveries, direct observation ofongoing scientific laboratories, andcomputational modeling of scientific discoveryprocesses -- by viewing them through the lens ofthe theory of human problem solving. First weprovide a brief justification for the study ofscientific discovery. Then we summarize themajor approaches and provide criteria forcomparing and contrasting them. Next, weapply these criteria to the different approaches,and indicate their complementarities. Finally,we provide several examples of convergentprinciples of the process of scientific discovery.

Page 2: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 2

Why Study Scientific Discovery?�What accounts for the appeal of science as

an object of study? The answer varies somewhataccording to discipline. From our perspective ascognitive scientists, we see five reasons forstudying science: for its human and humanevalue, to understand its mythology, to study theprocesses of human thinking in some of its mostcreative and complex forms, to gain insight intothe developmental course of scientific thinking,and to design artifacts — computer programsand associated instrumentation -- that can carryout some of the discovery processes of scienceand aid human scientists in carrying out others.

ValueThe nature of human thinking is one of the

"Big Questions" — along with the nature ofmatter, the origin of the Universe, and the natureof life. The kind of thinking we call "scientific"is of special interest, both for its apparentcomplexity and for its products.

Scientific thinking has enhanced our ability tounderstand, predict, and control the naturalforces that shape our world. As the myths ofPrometheus and Pandora forewarned us,scientific discoveries have also providedfoundations for a technical civilization fraughtwith opportunities, problems and perils; and wecall on science increasingly to help us solvesome of the very problems it has inadvertentlycreated. The processes that produced theseoutcomes are irresistible objects of study.Indeed, the same forces that motivate physicists,chemists, mathematicians, and biologists tounderstand the important phenomena in theirdomains drive historians, philosophers,sociologists, and psychologists to investigatescience itself.

MythologyA rich mythology about the ineffability of the

scientific discovery process, producing aparadoxical view of science as magic, hascaptured the romantic imagination. As Boden(1990) puts it in her book on the psychology ofcreativity:

The matters of the mind have beeninsidiously downgraded in scientific circlesfor several centuries. It is hardly surprising,then, if the myths sung by inspirationalistsand romantics have been music to our ears.While science kept silent aboutimagination, antiscientific songs naturallyheld the stage. (p. 288)

At times, the mythology of the“inspirationalists and romantics” has beenpromulgated by eminent scientists. For example,Einstein, in one of his discussions withWertheimer (1959[1945]) reveals his uncertaintyabout, "whether there can be a way of reallyunderstanding the miracle of thinking" (p. 227).Nevertheless, Einstein does allow that the sameprocesses that support everyday thought alsosupport scientific thought:

The scientific way of forming conceptsdiffers from that which we use in ourdaily life, not basically, but merely inthe more precise definition of conceptsand conclusions; more painstaking andsystematic choice of experimentalmaterial, and greater logical economy.(“The common language of science,”1941, reprinted in Einstein, 1950, p.98)We believe that Einstein was wrong about the

first claim (that thinking is miraculous) andcorrect about the second (that scientific conceptformation is not qualitatively different from theeveryday variety). The study of scientificdiscovery aims at specifying how normalcognitive processes enable humans to generatethe "precise definitions," "systematic choice ofexperimental material," and "logical economy"that Einstein identifies as the hallmarks ofscientific thought.

Pressing the LimitsA common scientific strategy for

understanding a complex system is to explore itsbehavior at the boundaries. "Pushing theenvelope" allows us to test whether the samemechanisms that account for normalperformance can account for extremeperformance. For example, the same forces thataccount for lift in an airplane's airfoil alsoaccount for stalls, but the mechanisms ofsubsonic flight cannot fully account for thedynamics of supersonic flight.

In human cognition, the products of scientificthinking lie at the boundaries of thoughtcapabilities. They epitomize the systematic, andcumulative construction of our view of the worldaround us (and in us). Scientific knowledgerepresents, as Perkins (1981) puts it, "the mind'sbest work". There are other manifestations ofcognitive excellence, but science has an internalcriterion of progress that sets scientificdiscovery somewhat apart from other complexhuman thought, such as the creation of new artor political institutions.

Page 3: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 3

Although the products of science reach one ofthe limits of human thought, it remains an openquestion whether or not the processes thatsupport creative scientific discovery are widelydifferent from those found in morecommonplace thinking. We hypothesize thatthey are not. Sir Francis Crick's (1988)reflections on the processes leading to discoveryof the structure of DNA concur with this view:

I think what needs to be emphasized aboutthe discovery of the double helix is that thepath to it was, scientifically speaking, fairlycommonplace. What was important wasnot the way it was discovered, but theobject discovered—the structure of DNAitself. (p. 67; emphasis added)Although Crick views the thinking that led

him and Watson to make this remarkablediscovery as "fairly commonplace" scientificthinking, he stops short of the additional claim— made by Einstein — that commonplacescientific thinking is not basically different fromeveryday thinking. Combined, the two claimssupport the view that one can study scientificthinking by, for example, examining thethought processes of participants in psychologyexperiments. Even if the discoveries thatparticipants make in such experiments — theproducts of their inquiries — are of no scientificsignificance, the explication of the processesused to make those discoveries can add to ourunderstanding of real-world scientific discovery.

The Paradox of Children's Thinking and itsDevelopment

The similarities between children’s thinkingand scientific thinking have an inherent allureand an internal contradiction. The allure residesin the inescapable wonder and openness withwhich both children and scientists approach theworld around them.

Children are born scientists. From the firstball they send flying to the ant they watchcarry a crumb, children use science'stools—enthusiasm, hypothesis, tests,conclusions—to uncover the world'smysteries. But somehow students seem tolose what once came naturally. (Parvanno,1990)The paradox comes from the fact that

investigations of children’s thinking haveproduced evidence that partially supportsdiametrically opposing views of their scientificreasoning skills. In support of the “child as ascientist” position, the developmental literatureis replete with reports showing that very young

children can formulate theories (Brewer &Samarapungavan, 1991; Wellman & Gelman,1992; Karmiloff-Smith, 1988), reason aboutcritical experiments (Sodian, Zaitchik & Carey,1991; Samarapungavan, 1992), and evaluateevidence (Fay & Klahr, 1996). Manypsychological studies also show that adults oftenexhibit systematic and serious flaws in theirreasoning (Kuhn, et al., 1988; Schauble &Glaser, 1990) even after years of formalscientific training (Mitroff, 1974). In contrast,many investigators (e.g., Kuhn, Amsel, &O'Loughlin, 1988; Kern, Mirels, & Hinshaw,1983; Siegler & Liebert, 1975) havedemonstrated that trained scientists, and evenuntrained lay adults, commonly outperformchildren on a variety of scientific reasoning tasks

The conflicting theoretical claims andempirical results emerging from the child-as-scientist debate have important implications forscience education. Moreover, the debate raisessome deep questions about how both scientistsand children "really" think -- questions that canonly be approached through more preciseformulation and study of the empirical and thetheoretical aspects of scientific thinking anddiscovery.

Machines in the Scientific ProcessThe final reason for studying science is that

such study may lead to better science. What welearn about the science of science leads into akind of “engineering of science” in which -- asin other areas -- we use our knowledge of anatural process to create an artifact thataccomplishes the same ends by improved means.

This has already happened for scientificdiscovery, as computational models used astheories of discovery in specific domains havebeen transformed into computer programs thatactually do some of the discovery in thesedomains. An early example of this transitionfrom psychological model to expert system wasthe DENDRAL program that, taking massspectrogram data as input, identified themolecules that had produced the spectra. Amongthe descendants of DENDRAL are programsthat carry out automatically much of the analysisfor genome sequencing in biology, and programsthat, independently or in association with humanscientists, discover plausible reaction paths forimportant chemical reactions.

Recent examples include Valdes-Perez's(1994 a, b, c) systems for discoveries inchemistry and physics, Fajtlowicz's inmathematics (Erdos, Fajtlowicz & Staton, 1991),Hendrickson's program for the synthesis of

Page 4: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 4

organic compounds (Hendrickson & Sander,1995), and Callahan & Sorensen's (1992)systems for making discoveries in the socialsciences (see Valdes-Perez, 1995; and Darden,1997, for brief reviews of recent work in thefield.)

Today, the conception and design of expertsystems that can collaborate with humanscientists in making discoveries is a vigorousand growing field of artificial intelligenceresearch, enlisting the efforts of both computerscientists and natural scientists in the disciplineswhere the applications are being made. We willnot cover this activity in this paper, but callattention to it because ideas from the expertsystems research are relevant for the theory ofhuman scientific thinking; and vice versa,components of human discovery processes maybe embedded in practical expert systems.

Conclusion: Why Study Scientific Discovery?Scientific discovery is a highly attractive area

for research because it possesses severalconsequential features: relevance to one of thegreat scientific questions, a mythology, adomain for testing theory at the limits, adevelopmental paradox, and direct applicabilityto expert system design. How does one go aboutdoing such research?

Approaches to the Study of ScienceEmpirical investigations of science fall into

five overlapping categories: (a) Historicalaccounts, (b) laboratory studies, (c) observationsof ongoing discovery, (d) computational orsimulation models, and (e) sociological studies.In this section we describe each approachbriefly, and in the following section, we discusstheir complementarity and convergence.

Historical AccountsHistorical accounts of scientific discovery

usually aim at describing the cognitive andmotivational processes of persons who havemade major contributions to scientificknowledge. To varying degrees, they examineboth the processes germane to the scientificproblems themselves ("internalist" accounts) andthe interaction of these processes with thebroader social environment ("externalist"accounts). These accounts -- based on analysesderived from diaries, scientific publications,autobiographies, lab notebooks, correspondence,interviews, grant proposals, and memos -- havebeen provided not only by historians (e.g.,Galison, 1987; Holmes, 1985 ), but also byphilosophers (e.g., Gooding, 1990; Nersessian,

1992; Thagard, 1992) and psychologists (e.g.,Feist, 1991, 1994; Gruber, 1974; Terman, 1954).The analyses are sometimes further enriched byretrospective interviews with the scientists —for example, Wertheimer's (1945) classicanalysis of Einstein's development of specialrelativity, or Thagard's (forthcoming-a) analysisof the recent discovery of the bacterial origin ofstomach ulcers.

Laboratory StudiesAnother way to study science is to observe

people's problem-solving processes in situationscrafted to isolate one or more essential aspectsof "real world" science. These studies aretypically carried out in the psychologylaboratory under the standard rubrics ofexperimental design, with experimental andcontrol conditions, and the use of statisticalsignificance tests. Participants in such studieshave included young children (Schauble, 1990;Siegler & Liebert, 1975; Sodian, Zaitchik, &Carey, 1991), college sophomores (Mynatt,Doherty, & Tweney, 1977, 1978; Schunn, 1995;Qin & Simon, 1990) , lay persons (Klahr, Fay &Dunbar, 1993; Kuhn, 1989), and practicingscientists (Schunn & Anderson, 19xx). The taskshave ranged from some that bear only anabstract relation to real scientific tasks to othersin which a real scientific discovery was strippedto its essential features so that it could be“replicated” in the psychology laboratory.

Examples of abstract tasks include thediscovery of the physics of an artificial universe(Mynatt, Doherty, & Tweney, 1977, 1978), thediscovery of an unknown function on aprogrammable device (Klahr & Dunbar, 1988;Schunn, 1995), and the discovery of arbitraryrules and concepts (Bruner, Goodnow, & Austin,1956; Wason, 1960). Examples of laboratorysimplification of real scientific discoveriesinclude Dunbar’s (1993) simulated moleculargenetics laboratory, in which participants werechallenged to replicate Jacob & Monod’sdiscovery of genetic control and Schunn &Anderson’s (in press) comparison of experts’and novices’ ability to design and interpretmemory experiments.

But laboratory experimentation plays a muchbroader role in science than simply as a tool fortesting hypotheses that someone has proposed,and experimentation on the discovery processitself can and should have similar breadth.. Inscience there is an important, and extremelycommon, form of experiment, at times referredto somewhat dismissively as "exploratory," thatis guided by no specific hypothesis to be tested,

Page 5: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 5

and no clear control condition, but only a vagueand general direction of inquiry, . The goal ofexploratory experiments is to permit phenomenato appear that will invite exploration or suggestwhole new forms of representation or generatenew hypotheses (Simon & Kotovsky, 1963).Exploratory experiments are of great importancein science, even though they are often treated assecond class citizens in textbooks on researchmethodology.

Michael Faraday's discovery ofelectromagnetic induction in 1831 came out ofjust such experiments, which were guided by nohypothesis more specific than "if, as Oerstedshowed, electric currents can generatemagnetism, then there should be circumstancesunder which magnetism will generate electriccurrents."1 There was no control condition, butjust a great deal of skillful manipulation ofapparatus, each new experiment being suggestedby the outcomes of the previous ones, to find anarrangement that would produce the hoped-forphenomenon and contribute to understanding theconditions under which it would appear.

Similar comments can be made about Krebs'experiments that led to the discovery of theornithine cycle for the in vivo synthesis of urea.The experiments were largely driven by thebroad idea that amino acids and ammonia werelikely sources of the nitrogen in urea, whichprovided a reason for experiments with variousamino acids. However, it was a lucky accidentthat the key catalyst for the reaction was anamino acid, ornithine, which, not being a sourceof the nitrogen at all, was tested for the "wrong"reason. The experiments do not fit the textbookparadigm of control and hypothesis testing, butwere clearly exploratory in nature, and Krebshimself saw them in that way.

In the realm of the psychology of discovery,examples of such exploratory laboratory studiesinclude Qin & Simon’s (1990) experiment inwhich college sophomores were presented withdata on planetary distances and periods to see ifthey could discovery Kepler’s third law, andKlahr & Dunbar’s (1988) initial study with theBig Trak. (In fact, in an interesting recursivetwist, one of the most important results producedby Klahr & Dunbar’s exploratory study was thattheir participants frequently did “experiments”in the absence of any hypothesis, but with thegoal of generating some interesting behavior of 1. For a contrary view, see Williams (1965), butGooding (1980) develops in detail essentially theposition taken here.

the device they were exploring.) We willdescribe a few other such studies in latersections of this paper.

In studying scientific discovery, laboratoryexperiments need not be limited to replicatingthe processes scientists have used to discoverhypotheses to fit data, or to test hypotheses.Exploratory experiments can also be used tohelp interpret other forms of evidence aboutdiscovery, and we will give examples, especiallyfrom the work of Gooding (1990), of originaland insightful efforts in this direction. Forinstance, reconstructing the instruments andmethods used in historical discoveries permitsthe student of discovery to re-experience theprocesses of generating and interpreting theoriginal data in its physical context, therebycasting light on the difficulties faced by theoriginal investigators in arriving at the meaningsof what they "saw."

Observation of Ongoing DiscoveryThe most direct way to study science is to

study scientists as they ply their trade. Theobserver records the important activities of day-to-day lab meetings, presentations, and pre- andpost-meeting interviews, lab notes and paperdrafts. The raw data are then coded andinterpreted within the framework ofpsychological constructs.

Both pragmatic and substantive factors makedirect observation extraordinarily difficult, andtherefore the least common approach to studyingscience. It requires the trust and permission ofthe scientists to allow an observer in their midst.The observer must be sufficiently well-versed inthe domain of investigation to understand deeplywhat is happening and what the fundamentalissues, problems and solutions are2. Moreover, itis extremely time consuming. Finally, suchinvestigations require a bit of luck, because theoutcome of such a study is of much moreinterest if a major discovery is made during theperiod of observation than if nothing of anygreat importance happens. One recentexemplary case of this approach can be found in 2. However, those interested in sociological factorssometimes prefer instead to be ignorant of thesubstantive knowledge and scientific conventions ofthe laboratory under investigation. “Latour’sknowledge of science was non-existent; his masteryof English was very poor; and he was completelyunaware of the existence of the social studies ofscience. Apart from ( or perhaps even because of)this last feature, he was thus in a classic position ofthe ethnographer sent to a completely foreignenvironment.” Latour & Woolgar, 1986, p273

Page 6: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 6

Dunbar's studies of four different world-classlabs for research in molecular genetics (Dunbar,1994, 1997; Dunbar & Baker, 1994). Giere’s(1988, Chap. 5) account of how a high energyphysics lab is organized provides a somewhatsimilar example of the "in vivo " approach, incontrast to the "in vitro" approach of laboratoryexperiments.

Computational Models of Discovery: Artifactand Explanation

A theory of scientific discovery processes cansometimes be cast in the precise terms of acomputational model that simulates theseprocesses and re-enacts discoveries. From amathematical standpoint, such a model is a set ofdifference equations that describes and predictsthe dynamic path the discovery system willfollow from the time it takes up a problem untilit solves or abandons it. Thus, it is highlysimilar to the systems of differential equationsused to model theories in the natural science.

The goal of such a model is to replicate thekey steps in the cognitive processes of scientistsas they made important discoveries (see Shrager& Langley, 1990 for an introduction to thisliterature). Modeling draws upon the same kindsof information as do the historical accounts.However, it goes beyond the historical record tohypothesize cognitive mechanisms that aresufficiently specific to make the samediscoveries the human scientist made, followingthe same path.

Computational modeling of conceptformation and rule induction -- activitiesrelevant to scientific discovery -- has a longhistory (Hovland & Hunt, 1960; Simon &Kotovsky, 1963). More recent (and much morecomplex) examples of the approach includecomputational models of the cognitive processesused by Kepler, Glauber, Dalton, Krebs, andothers in making historically important scientificdiscoveries (Cheng & Simon, 1992; Gooding,1990; Kulkarni & Simon, 1990;Simon, Langley,& Bradshaw, 1981; Langley, et al., 1987;Grasshoff and May, 1995).

Sociological ApproachesIn recent years, the sociology of science has

directed most of its attention to externalistaccounts of discoveries, which seek to explaindiscovery as a product of political,anthropological, or social forces (Bloor, 1981;Latour & Woolgar, 1986; Pickering, 1992;Shadish & Fuller, 1994). In these approaches,the mechanisms linking such forces to actualscientific practice are usually motivational,

social-psychological, or psychodynamic, ratherthan cognitive (Bijker, Hughes, & Pinch, 1987).An interdisciplinary amalgam of such studieshas developed under the rubric of Social Studiesof Science (for examples, see Mahoney, 1979;Landau, 1977). This orientation to the study ofscience provides some important insights onhow social and professional constraintsinfluence scientific practices, although theseaccounts tend to treat cognitive processes at alarge "grain size" relative to the questionsaddressed in this paper (Latour & Woolgar,1986).

Some of this work has taken an extremedeconstructionist turn that has been rejected(Gross & Levitt, 1994) and parodied (Sokal,1996). Unfortunately, such extremes have ledmany researchers in the physical and biologicalsciences to mistakenly conclude that all "socialscience approaches" -- including cognitivepsychology -- have little to contribute to a betterunderstanding of science. However, the worksurveyed here is not subject to this criticism, forit conforms to the same canons of science as theresearch it undertakes to describe and explain.In any event, because of our internalistemphasis, we will not have much to say, in thispaper, about the sociology of science, either inits defensible or indefensible forms.

Assessing the ApproachesEach of the four approaches that are

employed in research on scientific discovery hasits particular strengths and weaknesses. In thissection we will summarize the criteria generallyused for evaluating research methods in generaland psychological research methods in particularand then we will assess -- in general terms -- theextent to which the different approaches satisfythese criteria.

Criteria for Evaluating Research methodsThe effectiveness of a research strategy for

addressing a particular problem of scientificdiscovery can be assessed in terms of eightcriteria: (i) face validity, (ii) construct validity,(iii) temporal span and resolution of data, (iv)fruitfulness for discovering new phenomena, (v)rigor and precision, (vi) control and factorabilityof variables, (vii) external validity, and (viii)social and motivational context.

Face validity. A study has face validity if itmeasures what it is supposed to measure.Research on scientific discovery has facevalidity to the extent that the phenomenon beinginvestigated is clearly an instance of somethingbeing discovered by a scientist. The farther the

Page 7: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 7

study is from science or discovery, the lower theface validity. For example, research on adiscovery of historical importance -- such asFaraday's discovery of the magnetic induction ofelectricity (Duncan & Tweney, 1997) or hiswork on acoustics (Ippolito & Tweney, 1995) --has very high face validity because there is noquestion that the behavior being studied reallydid lead to a discovery. On the other hand,research that asks college students to list asmany uses of a brick as they can think of (Finke,Ward and Smith, 1992, pp.183-4) or to discovera rule about number triples of which 2-4-6 is anexample (Wason, 1960) has lower face validity,because the extent to which the “discovery” hasanything in common with genuine scientificdiscoveries is open to question. In anappropriate research design, the way in whichparticipants approach these kinds of laboratorytasks may, in fact, reveal something about theirscientific skills, but this is not evident unless wehave independent evidence for similaritybetween thinking in the psychologist’s lab andthinking about solving a real scientific problem .

Construct validity. As phenomena in adomain begin to acquire a theory, theoreticalterms are generally introduced, referring toentities that are not directly observable. Theability to evaluate theoretical terms and testtheories containing them depends on theoperations and instruments used to measurepresence and magnitude of such terms indirectlyand convergently (Simon, 1974, 1983, 1985).Thus, construct validity evaluates how well themeasures being used are goodoperationalizations of the underlying theoreticalconstructs about scientific discovery.

Temporal span and resolution of data. Twoimportant and related criteria for assessing howwell a research method captures importantprocesses are the temporal span of the discoveryepisode that is studied and the temporalresolution of the data that describe behaviorduring that span. Typically, longer spans ofepisodes produce data of lower resolution. Thus,a particular problem may occupy a scientist foran hour, a day, a week, a year, or decades, whileat the other end of the scale, insights and acts ofrecognition might require only fractions of asecond of cognitive processing.

For example, while working primarily onelectrochemical problems, Faraday broodedintermittently about electromagnetism for anentire decade: from 1821, when he first learnedof Oersted's induction of magnetism by anelectric current, to 1831, when he made hisinitial key discovery of induction of currents by

magnets. Throughout this period, he kept ameticulous diary that described each experimentand its outcome. Thus, the grain size duringperiods when he was doing electromagneticexperiments was on the order of a few hours, butthere were often gaps of months, or even years,between such experiments..

More than two decades passed between thetime when Kepler published (Kepler, 1596) anearly (and erroneous) version of his Third Lawand the time when he returned to the problemand got the right answer after a few weeks'further work (Kepler, 1619). However, here wehave a much rougher grain size, because ouronly record of his work on the problem are (a) afew published paragraphs when he announcedthe erroneous law, and (b) the pages in thevolume in which he published his successfulsecond set of calculations, with a few commentson how the work had been stretched out overseveral months because of the computationalerrors he made.

Of course, not all discoveries extend oversuch long periods of time: Planck achieved hisrevision of Wien’s law of blackbody radiation ina single evening in 1900 (Langley, et al., 1987);and when we move from real science tolaboratory studies of scientific thinking, we enterthe realm of tasks that typically require fromtens to hundreds of minutes. For example,participants in Wason’s rule discovery tasksusually take about 20 to 30 minutes to discoverthe rule; Klahr & Dunbar’s participants spentabout 30 minutes to make their discoveries; Qin& Simon’s participants took about an hour, onaverage, to discover Kepler’s Third Law;participants in Schunn’s (1995) “milk-truck”microworld took up to 90 minutes before theydiscovered one of its complex rules, and someparticipants in Mynatt, et al.'s (1978) artificialuniverse task worked at it for up to 10 hours. Agraduate student was given the same data thatBalmer used to discover in 1885, after severalmonths of search, his formula for the hydrogenspectrum . In about six weeks of half-timework, the student rediscovered the formula(Simon, personal communication).

Although these discoveries occur overrelatively brief spans, the data resolution iscorrespondingly finer grained, so thatparticipants’ hypotheses, representations,insights, and impasses can be recorded overextremely short durations, sometimes as little asa fraction of a second. Observation of suchsituations enables the researcher to track theinitial interpretation of problem instructions, thecreation of initial representations, impasses and

Page 8: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 8

revised representations. Data collected fromsuch investigations often include extensiveverbal and behavioral protocols that can beanalyzed at varying levels of aggregation.

Fruitfulness for discovering new phenomena.The contemporary literature on researchmethodology is dominated by the notion,promulgated by Popper (1959) among others,that the purpose of observation in general, andexperiment in particular, is to test hypotheses inorder either to falsify or validate them. Incontrast to this position, we have argued thatmuch of the important empirical work in scienceis undertaken -- to use Reichenbach's phrase --in the context of discovery rather than thecontext of verification (see Simon, 1973). Thatis, a major goal of empirical work in science isto discover new phenomena and generatehypotheses for describing and explaining them,and not simply to test hypotheses that havealready been generated. Indeed, theories cannotbe tested until they have been created, andcreation takes place in the context of discovery,not verification. In his Patterns of Discovery(1958) Hanson took a pioneering step towardgiving discovery "equal time" with verificationin the study of science.

The distinction between finding newphenomena (e.g., Oersted's unexpecteddiscovery that an electric current created amagnetic field orthogonal to it) and testing atheory that explains it (e.g., Michelson andMorley's experiment showing that the velocityof light was independent of its motion throughan "aether"), is closely related to the distinctionin computational models of discovery betweendata-driven and theory-driven systems (Karp,1990; Langley, et al., 1987) and to thedistinction, in psychological studies ofdiscovery, between experimenters and theorists(Klahr & Dunbar, 1988)

Hence, among our criteria for evaluatingmethods for research on scientific discovery, weinclude their fruitfulness for discovery. We willuse this criterion to evaluate the extent to whicha particular approach to research on scientificdiscovery is likely to uncover new phenomenaabout the discovery process. This criterion isrelated to construct validity, because how welltheoretical constructs are operationalizeddepends, in part, on how such terms aregenerated in the first place (Langley, et al.,1987).

Rigor and precision. At one end of the scalewe have data (usually numerical) that can bereported with precision (e.g., reaction times,proportion of correct answers, changes in

hypotheses, number of trials to solution), orevents recorded in a reproducible coding scheme(e.g., rigorous coding of verbal protocols); at theother end, we have descriptions (usually verbaland informal) of complex events, involvinginterpretation and summarization of the sourcedata beyond the resources of a precise codingscheme.

Typical examples of the former include thevast literature on discovery-related cognitiveprocesses: concept learning studies,investigations of complex problem solving, andthe “simulated science” investigations citedearlier, where speed of solution, errors along theway, and other statistics are measured. Typicalexamples of the latter include the analyses ofFaraday’s diaries (Duncan & Tweney, 1997) orDarwin's notebooks containing successiveversions of what became his theory of evolutionby natural selection (Gruber, 1974). Unlessqualitative data are coded and analyzedaccording to unambiguous and objective criteria,it is difficult to make precise predictions andhence to test theories rigorously against the data.

Control and factorability of variables.Experiments, at least those described intextbooks on research methods, aim to separatethe effects of specific variables upon thephenomena of interest, to control for the effectsof other variables and to minimize thesystematic effects of uncontrolled sources oferror. The so-called "experimental method,"adorned with all of these attributes, is oftentaken as the quintessence of science.

However, we have already noted that muchscience is, and must be, observational rather thanexperimental in this special sense -- that is,aimed at discovering new and interestingphenomena and using them to stimulate thegeneration of hypotheses, rather than at testinghypotheses formulated prior to observation .There is a literature on observational (Webb,Campbell, Schwartz, & Sechrest, 1966), quasi-experimental (Cook & Campbell, 1979), case-studies (Barlow & Hersen, 1984; Kratochwill1978), and naturalistic methodologies. There isless than full consensus about the role of thesemethodologies in the processes of science. Animportant goal in research on scientificdiscovery is to assess and evaluate the respectiveroles of "orthodox" experiments and other kindsof observation in the processes of discovery andverification.

A great deal can be learned about thediscovery process by examining failures as wellas successes. Although the historical record ofscientific discovery focuses more on the latter

Page 9: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 9

than on the former, there is a literature thatattempts explain why, in situations where a”race” for some discovery occurred, one labsucceeded while others failed (Crick, 1988 ;Judson, 1979). Moreover, in every successstory, a period, sometimes very long, of failureto reach the goal precedes the final success.Although a great deal of information is thereforeavailable on the conditions that have to besatisfied to turn failure into success, the myriadfailures of scientific discovery remain under-reported and under-examined. Laboratorystudies can be of special value in thisconnection, for they allow investigators tocontrol precisely the variables that arehypothesized to affect success and failure (e.g.,Dunbar, 1993; Gorman, 1992; Penner & Klahr,1996), or to examine some of the detaileddifferences in processes used by successful andunsuccessful participants.

External validity. To what extent the resultsof a study can be broadly generalized is thequestion of external validity. In any particularinstance of empirical research on discovery, onlya limited task being addressed by a specificpopulation (or person, in the case of historicalstudies) is investigated. This true not only forlaboratory studies but also for historical studiesand in vivo investigations. As the totality ofdomains in which discoveries may be made isboundless, each piece of research may beregarded as a "case study." The challenge is tobring such studies together so that a theory ofdiscovery can emerge from them.

External validity is related to the “child-as-scientist” debate mentioned earlier. Althoughthere is no question about children’s inadequacyrelative to adults when it comes to skills that aredomain-specific, laboratory study or naturalobservation may reveal, in specific contexts, thatchildren have surprising competence in someaspects of scientific reasoning. Nevertheless,until flexible and adaptive deployment of suchcompetencies is demonstrated, the externalvalidity of such findings remains in question.

Social and Motivational Context . In thispaper we will be concerned primarily withinternalist accounts of discovery, but even ifexternal factors are not the focus of study, theymay still provide alternative explanations forphenomena that have been attributed to internalfactors. At a minimum, no study of discoverycan disregard the central fact that individualscientists or groups of scientists are always partof a wider social environment, inside andoutside science, with which they are in constantcommunication and which has strongly shaped

their knowledge, skills, resources, motives andattitudes.

The interaction between social and cognitivefactors is beautifully illustrated in Thagard’s twopart account of the discovery of the bacterialorigins of stomach ulcers (Thagard,forthcoming a, forthcoming b). This accountdemonstrates how the weight of empiricalevidence can overwhelm even the mostentrenched and socially accepted scientificbeliefs. The initial view that ulcers were causedby bacteria was viewed as “preposterous” whenfirst proposed in 1983, but has, by now,achieved nearly universal acceptance. Thereasons for both the initial and final positionsclearly involve important social mechanisms.Nevertheless, as Thagard cautions:

It is important not to succumb to the sloganthat science is a social construction.Proponents of that slogan tend to ignoreboth the psychological processes of theoryconstruction and acceptance .... and thephysical processes of interaction with theworld via instruments and experiments.Undoubtedly interests and social networksabound in the ulcers case as in otherepisodes in the history of science. Butexplaining scientific change solely on thebasis of social factors is as patentlyinadequate as purely logical andpsychological explanations.Thus, acknowledgment of the role of non-

cognitive factors in the process of scientificdiscovery does not plunge such research into thesocial constructivist pit. We fully concur withGiere's (1988) wry critique of extreme versionsof the doctrine of cultural relativism sometimesembraced by constructivists.

Constructivists have no qualms aboutassuming the reality of other people. Theyare perfectly willing to explain Jones’actions by reference to a conversationJones had with Smith. Is that not assumingtoo much? Should we not rather say thatJones believed he had a conversation withSmith? Surely that would be silly.Restricting explanations of physicists’activities to invoking only their beliefsabout protons, rather than protonsthemselves, is just as silly. (p127)

Strengths of the several methodologies.How do the different approaches to studying

discovery fare on the evaluative criteriadescribed above? In Table 1 we provide asuccinct depiction of their relative merits.

Page 10: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 10

However, the table requires some interpretation,for matters are not as simple as it makes themappear. In order to give a coherent appraisal ofeach approach, our interpretation will proceedcolumn-wise (by approach) keeping in mind thefact that the row-wise (by criterion) comparisonsare particularly informative (e. g., the relative

face validity of Historical Study versusLaboratory Experiment). In the followingdiscussion, we limit the explanation of ourevaluation mainly to cells in Table 1 with entriesthat are high (***) or low (* or empty). At thislevel of analysis any more precise comparisonswould be difficult to justify.

Table 1 Dimensions of Strength of Four Approaches to Research on Scientific Discovery.

Type of ApproachHistoricalStudies

Laboratory Studies DirectObservation

ComputationalModeling

Evaluative Criteria Exploratory Controlled1. Face validity *** * *** *2. Construct validity * * ** ** **3. Temporal span and resolution

Short & fine-grained * *** *** ** ***Long and coarse-grained *** * **

4. New Phenomena *** *** * *** **5. Rigor & precision * ** *** ** ***6. Control & factorability * *** * ***7. External validity * * ** * **8. Social & motivational factors *** *** *

Note: Each approach is evaluated on the criteria as either very high (***), high (**), modest (*), or poor (no entry).

Historical studies. We give historical studieshigh marks for face validity because, bydefinition, they investigate the very phenomenonthat they seek to explain. Topic and scope arechosen after the fact, so there is no doubt that a“real” discovery by a “real” scientist is the focusof investigation.. The temporal resolution ofsuch investigations depends on the sources ofdata available. It is quite coarse when theprimary sources are publications; can becomemuch finer — on a scale of days, say — to theextent that laboratory notebooks andcorrespondence are available. Historical studiesseldom permit the study of anythingapproaching minute-by-minute, or even hour-by-hour sequences of the scientists' thoughts.

Given the unique and often idiosyncraticaspects of the interaction between a particularscientist, a particular state of scientificknowledge, and a particular discovery,historical studies are highly likely to generatenew phenomena. The down side of thispotential for novelty is low external validity,with respect to generalizability to "all" scienceand "all" scientists. Such careful investigationsas Gooding's analyses of Faraday;s extensive

and meticulous notebooks (Gooding, 1990) areof necessity limited to a single scientist.However, as such studies cumulate they can betreated collectively as a sample of events thatcan be compared and contrasted to find theunderlying general laws.

We give this approach low marks on rigorand precision because of the subjective andunverifiable nature of much of the raw data.Even when based on daily lab notebooks, thedata are subject to all of the self-reporting biasesthat make retrospective verbal reports lessrigorous than concurrent verbal protocols(Ericsson & Simon, 1993. And when theaccounts are based on the recollections andintrospections provided in autobiographies ofgreat scientists (e.g., Hadamard, 1945; Poincaré,1929), or by systematic interviews withscientists about their work (e.g., Rowe, 1953)reliability is always in doubt: "But did theyreally think this way?" asks Nersessian (1992) ofher own analysis of Maxwell's discovery ofelectromagnetic field theory . "In the end we allface the recorded data and know that every pieceis in itself a reconstruction by its author" (p. 36).

Page 11: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 11

Finally, historical studies rank high on theirtendency to address both social and motivationalfactors surrounding the discovery process. Thisis true, in part, because historical studies pre-date the emergence of the cognitive sciences,and to the extent that they treated psychologicalvariables at all, they tended to concentrate onnon-cognitive types of psychological variables.

Laboratory studies. Enough has already beensaid about laboratory studies to indicate theirstrengths. Their chief limitations are in facevalidity — it is seldom possible to study in thelaboratory discoveries like those that fill thehistories of science, although we will discusscases where this has been done. Controlledlaboratory studies are especially adapted to theneeds of verification in a theory-drivenparadigm. As usually designed, with concernfor clear tests of well-defined hypotheses usingexperimental controls, the controlled laboratorystudy reduces, although it does not exclude, thelikelihood of wholly unanticipated outcomesthat involve new variables or new phenomenathat were not considered in the experimentaldesign. Exploratory laboratory studies sacrificethe rigor and precision of controlled experimentsin favor of their potential for producinginteresting new phenomena. Both types oflaboratory studies tend to generate fine-graineddata over relatively brief periods, and theytypically ignore or attempt to minimize theeffects of social and motivational factors on thediscovery process.

Direct observation. Direct observations ofon-going science have many of thecharacteristics of historical data -- in particular,high face validity and potential for detectingnew phenomena. However there are twoimportant differences between historical anddirect approaches. First, the observations mayachieve much finer-grained temporal resolutionof ongoing research processes than historicalresearch. Second, direct observation provides alevel of rigor, precision, and objectivity that islacking in retrospective accounts by scientists oftheir discoveries.

Computational modeling. Our evaluation ofcomputational modeling derives from our viewof it not as a method for gathering data but as amedium for generating theories and forrepresenting and testing them against data thathave been obtained by the other methods.Hence it is clearly not a substitute for the others,but complementary to them. One of itsimportant applications is to provide tests of thesufficiency of the mechanisms postulated in atheory of discovery to actually produce the

discovery. The model will be unable to achievethe discovery unless it does possess a sufficientset of mechanisms, appropriately organized.

The modeling approach can achieve highexternal validity by using a single model tosimulate behavior in a whole range of discoverytasks (Kulkarni & Simon, 1990). To this end, amodel of discovery, like any theory, must befactored into two components: (1) its basicmechanisms, retained without alteration fromone application to another; and (2) specificknowledge of the content and research methodsof each task domain to which it is applied. Thefirst component reveals the extent to whichgeneral methods can account for discoveriesover a range of domains, and the secondcomponent indicates the extent to which adiscovery relies on domain-specific knowledgeand methods. A general theory of discovery canemerge from the components of the such modelsthat are common to many tasks.

Of course, this separation of the general fromthe specific is not limited to formal simulationmodels, but extends to general theories ofdiscovery, however expressed. What is specialabout simulation models is the rigor with whichthey can be stated, and the clarity with whichgeneral and task-dependent elements in thesystem can be distinguished. Simulationsprovide us with powerful methods forcomparing the theoretical implications of thedata from particular case studies, interpreting thedata in a common formal language that canreveal both the identity or similarity of processesthat were involved in each case and thedifferences among them .

The construct validity of modeling dependson the construct validity of the tasks that aremodeled. Modeling enables us to express atheory rigorously, and generally, to simulatephenomena at what ever temporal resolution andfor whatever durations are relevant. The effectsof changes in particular variables can be studied,and other variables can be held constant.Theoretical variables are clearly specified so thatconstruct validity is high.

Social variables can, in principle, beincorporated in models, although, with theimportant exception of including the specialized,and socially acquired knowledge of the domainexpert, this has not usually been done inmodeling discovery. On a large scale, it wouldbe possible to model a scientific communityrather than a single scientist, the members of thecommunity being linked, for example, by the"blackboard" of publication. Limits on speed of

Page 12: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 12

computation may require a trade-off betweenfine temporal resolution and long durations.

Assessing the Approaches: SummaryIt should be clear from this brief exercise in

comparative assessment that there is no singlebest way to study the discovery process.Research on the science of discovery is subjectto the same inevitable tradeoffs that characterizeresearch methodologies and paradigms in allscientific disciplines. But these tradeoffs do notimply that the different approaches areincompatible. To the contrary, the fundamentalthesis in this paper is that the findings from thesediverse approaches, when considered incombination, can advance our understanding ofthe discovery process more than any singleapproach. In the final two sections of this paper,we will attempt to show how the methodscomplement one another and how theircomplementarities are beginning to produceconsistent convergent evidence about theprocess of scientific discovery. But first, inorder to provide a common language fordescribing these complementarities andconvergences, we introduce, in the followingsection, set of concepts and terms that havebeen used to characterize human problemsolving.

Scientific Discovery as Problem SolvingWe argued earlier, in citing Francis Crick's

account of the discovery of DNA, that majorscientific discoveries are so labeled because theknowledge that they produce is important andnot because they derive from any unusualthought processes. Psychologists have beenmaking the case for the "nothing special" viewof scientific thinking for many years (e.g.,Simon, 1966), pointing out that information-processing theories of human problem solvingcould account for many of the "unique"characteristics of scientific discovery. This viewhas been elaborated more recently—particularlywith respect to the issue of creativity—byseveral others, including Boden (1990), Perkins(1981), Simon, Langley, and Bradshaw (1981)and Weisberg (1993). The success ofcomputational models like BACON andKEKADA — based as they are on a small set ofrelatively straightforward heuristics for findingregularity in existing data sets — providesfurther support for this position.

This view does not imply that the averageperson could walk into a scientist's lab andproceed to make discoveries. Practitioners of ascientific discipline must acquire an extensive

portfolio of relatively particular methods andtechniques during their long professionaltraining, and must apply their skills in thecontext of an immense, cumulative, base ofshared knowledge about the discipline'sphenomena, theories, procedures,instrumentation, experimental paradigms, anddata-analytic methods, not to mention its history,funding procedures, social and politicalimplications, institutional structure, and even itspublication practices (see Bazerman, 1988).

These components of expertise constitute the“strong” or “domain-specific” methods. Theprocesses we are focusing on are the “weakmethods”: domain-general, universal, problem-solving processes. Although the strong methodsused in scientific problem solving distinguishthe content of scientific thinking from everydaythought, we claim that the weak methodsinvoked by scientists as they ply their trade arethe same ones that underlie all human cognition.In this section we sketch very briefly some ofthe basic components of the general theory ofproblem solving in order to provide a commonlanguage for discussing the convergence ofdifferent approaches to the study of discovery.

Problem solving, search, and weak methodsA problem consists of an initial state, a goal

state, and a set of operators for transforming theinitial state into the goal state by a series ofintermediate steps. Operators have constraintsthat must be satisfied before they can be applied.The set of states, operators, goals, andconstraints is called a “problem space,” and theproblem solving process can be characterized asa search for a path that links initial state to goalstate (Newell and Simon, 1972).

Initial state, goal state, operators, andconstraints can each be more or less well-defined. For example, one could have a well-defined initial state and an ill-defined goal stateand set of operators (e.g., make “somethingpretty” with these materials and tools), or an ill-defined initial state and a well-defined final state(e.g., prove a particular mathematicalconjecture). But well-definedness depends onthe familiarity of the problem space elements,and this, in turn, depends on an interactionbetween the problem and the problem solver.More specifically, it rests on the process ofrecognition. Before any search process can beapplied, its relevance must be recognized by thedetection of appropriate patterns in the situation.Observation of such patterns evokes informationabout the situation that can help guide thesearch. As this information usually is domain

Page 13: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 13

specific, the recognition mechanism tends tomake it available just where it is potentiallyrelevant (i.e., instances of positive transfer).Such recognition is not always productive, aswitness cases of negative transfer, functionalfixedness (Dunker, 1945), and Einstellung(Luchins, 1942).

Although scientific problems are much lesswell-defined than the puzzles commonly studiedin the psychology laboratory, they can becharacterized in these terms. In both cases,well-definedness and recognition depend notonly on the problem, but also on the knowledgethat is available to the problem-solver. For thatreason, much of the training of scientists isaimed at increasing the degree of well-definedness of problems in their domain.

In all but the most trivial problems, thesearch process can be quite demanding. If werepresent the problem space as a branching treeof m moves with b branches at each move, thenthere are bm moves in the full problem space.As soon as m and b exceed small values,exhaustive search of the space is beyond humancapacity. Thus, effective problem solvingdepends in large part on processes that constrainsearch judiciously to the exploration of a fewbranches.

Search constraint processes may includeweak methods and strong methods. Weakmethods, while requiring little knowledge of theproblem structure, are correspondinglyunselective in searching the problem space.Strong methods may find solutions with little orno search. For example, someone who knowsthe calculus and is seeking the maximum of afunction applies a known algorithm (taking thederivative and setting it equal to zero), findingthe answer without search. But it is up to therecognition process to detect the fit between agiven problem and the maximization of acontinuous function. We will describe five majorweak methods: generate and test, hill climbing,means-ends-analysis, planning, and analogy.

Generate and Test. This method, often called“trial and error”, consists simply of applyingsome operator to the current state and thentesting to determine if the goal state has beenreached and the problem solved. If it hasn't,then some other operator is applied. Anexample of a “dumb” generating process wouldbe searching in a box of keys for a key to fit alock, and tossing failed keys back into the boxwithout noting anything about the degree of fit,.A slightly “smarter” generator would, at theleast, try each key only once.

Hill climbing. In hill-climbing, one makes atentative step in each of several directions, andthen heads off in the direction that has thesteepest gradient. More generally, the methodcomputes progress in the direction of the goal.The move that shows most progress is chosen,and then the process iterates from the new state.Hill climbing utilizes more information thangenerate-and-test about the direction anddistance of the goal, and uses that information toconstrain search in the problem space.

Means-ends analysis. Means-ends analysiscompares the current state and the goal state, anddescribes the differences between them. Then itsearches for an operator that is designed toreduce the most important differences (Dunker,1945; Newell & Simon, 1972). If the conditionsfor the applicability of the operator are not meta subgoal is formulated to reduce the differencebetween the current state and a state in which thedesired operator can be applied. Thus themethod attempts to solve the subproblemrecursively.

Planning. Planning involves (a) forming anabstract version of the problem space byomitting certain details of the original set ofstates and operators, (b) forming thecorresponding problem in the abstract problemspace, (c) solving the abstracted problem byapplying any of the methods listed here(including planning), (d) using the solution ofthe abstract problem to provide a plan forsolving the original problem, (e) translating theplan back into the original problem space andexecuting it (Newell & Simon, 1972).

Preparing a meal is a complex problem-solving task. A plan might be: go into thekitchen, select a menu, prepare each dish, set thetable, serve the meal. Because planningsuppresses much of the detail in the originalproblem space, it is not always possible toimplement the plan, for some of the steps inplanned solution paths may not be achievable.For example, an essential ingredient for one ofthe items on the planned menu may not be onthe shelves.

Analogy. Analogy involves mapping a newtarget domain onto a previously encounteredbase domain (Vosniadou & Ortony, 1989).Mappings vary widely in complexity. In theirsimplest manifestation they involve simply therecognition that the current problem can besolved by a known procedure. At the otherextreme, the mapping process may be quiteelaborate (Gentner & Jeziorski, 1989; Halford,1992) and, like the other weak methods, notguaranteed to produce a solution. Analogy can

Page 14: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 14

be viewed as one method for changing the givenproblem space to another that is more effective.

Analogical mappings thus provide theprincipal bridge between weak and strongmethods when the source of the analogy is awell-defined procedure. Used in conjunctionwith domain-specific knowledge, analogy mayenable the search process to be greatly abridgedwhen patterns are noticed in the current problemstate. Pre-stored knowledge can be evoked andused to plan the next steps toward solution of theproblem, provide macros to replace wholesegments of step-by-step search, or even suggestan immediate problem solution. Therecognition mechanism (with its associated storeof knowledge) is a key weapon in the arsenal ofexperts and a principal factor in distinguishingtheir performance in the domain of expertisefrom that of novices.

Problem Solving: SummaryScientific practice applies a plethora of strong

methods, such as standard experimentalparadigms, established theories and knownparameter values, specialized instrumentation,and even highly constrained publication formats.Strong methods can admit, as in themaximization example above, the directapplication of a method with little or no search.Weak methods, though less effective whenstrong methods are available, are of specialinterest for a theory of scientific discoverybecause they are applicable in a wide variety ofcontexts, and because fewer and fewer strongmethods remain available as the scientistapproaches the boundaries of knowledge .Moreover, analogy, though a weak (i.e., verygeneral) method, draws on all the domainspecific knowledge and skill stored in memory.

Especially central to the weak methods arethe processes of selective (heuristic) search andthe habit of storing in long-term memory largebodies of domain-specific information,"indexed" by recognizable patterns, so that itsrelevance will be evoked in particular situations,and the information will become accessible. Tostudy scientific discovery we have to find outhow to observe these and the other weakmethods at work on scientific problems, or toevoke them experimentally in contexts thatmimic some of the richness of actual researchcontexts, while at the same time, maintaining theobjectivity that supports sound inference.

Complementarity of ApproachesOur theoretical framework views scientific

discovery as a type of complex problem solving.This framework provides a common language

that can be used to describe bothcomplementarity and convergence in the variousapproaches to the study of scientific discoveryA powerful way to exploit complementarity is tostudy the same scientific discovery using morethan one approach. In this section, we giveseveral examples of how the strengths ofdifferent approaches can be complemented andtheir weaknesses attenuated or eliminated byusing them together. We begin with historicaland laboratory studies, whose strongcomplementaries are revealed by Table 1.

Combining Historical with Laboratory StudiesThe discovery of genetic control. In the late

1950’s, Jacques Monod and François Jacobdiscovered the mechanisms by which thesynthesis of lactose is controlled in bacteria bycontrol genes (Jacob & Monod, 1961). For thisdiscovery, Monod, Jacob, and their mentorAndré Lwoff, were awarded the Nobel prize in1965. A substantial historical literature examinesthe discovery (e.g., Judson, 1979, 1996)including an autobiographical account (Jacob,1988). As is sometimes the case in historicalanalyses of scientific discovery, cognitiveprocesses are given an important role inJudson’s account of Jacob & Monod’s work,although the historian necessarily treats theseprocesses at a very general -- almostmetaphorical -- level:

Scientists reach an extreme, sustainedidentification of the patterns of theirthought with the patterns that they perceivein, and project into, the phenomena theyare trying to elucidate. ... In each of thediscoveries he (Monod) made in theensuing ten years there was a moment oftotal absorption as he resolved the bacterialcell like a partly cut diamond slowly in thelight, then a gleam of perception so quick --and so quickly resolved into its place in thesequential pattern -- that to Monod himself,on his testimony at least, there had beennothing much to call an intuitive leap,merely an extension, subject to test, of theinevitable logic of the system itself.(Judson, 1996, p.387)But for a cognitive psychologist, to

characterize Monad's discoveries in terms of a"gleam of perception" is to not describe them atall. Instead, the goal is to identify specific andwell-understood cognitive processes and then todetermine their role in the discovery. In the caseof the discovery of the mechanism of geneticcontrol, perhaps the most important cognitiveprocesses involved were the representational

Page 15: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 15

changes that enabled Jacob & Monod to replacetheir entrenched idea that genetic control mustbe some kind of activation mechanism with thediscovery that it was, instead, an inhibitionmechanism. As Dunbar (1993) puts it:

They made the novel and unexpecteddiscovery that groups of genes controlother genes and keep many genes inhibiteduntil particular enzymes are needed. Thiswas the novel concept of an ‘operon’ thatforced a ‘radical restructuring’ of theconcept of how genes work. ( p. 398)In order to better understand the cognitive

processes involved in this important discovery,Dunbar (1993) created a laboratory task that heused to study the behavior of college studentsfaced with a problem that captured some of theessential elements of the discovery problemfaced by Monod and Jacob, while eliminatingmany others3. Dunbar’s simplifications wereprimarily aimed at constraining the search spaceand at keeping the depth of search withinreasonable limits. His three principlesimplifications were: (a) to have participantsdiscover how genetic regulation worked(whereas Monod and Jacob first had to discoverthat there was such a thing as genetic regulationof some genes by other genes); (b) to provideparticipants with a highly constrainedexperiment space (whereas Monod, Jacob, andtheir colleagues had to invent many newprocedures); (c) to limit the necessary discoveryto the particular instance at hand, rather than to abroad-based concept of genetic control.

In summary, Dunbar placed his participantsin an experimental context that simulatedMonod and Jacob's problem at the point wherethe idea of "control genes" had occurred to themand they had developed a basic experimentalprocedure for testing alternative control-genehypotheses to explain the lactose phenomena.The students were asked to design and run(simulated) experiments to discover the lactosecontrol mechanism. Using a real scientific taskincreased the typically low face validity of acontrolled laboratory study. Although the actualtask was simplified for purposes of theexperiment, some of the basic components —the problem, the "givens," the research methodspermitted by known kinds of experiments, the 3. Clearly, no laboratory simulation could present thefull challenge presented by scientific problems.Monod and Jacob worked together for several yearsbefore they made their discovery, and most collegestudents would prefer to spend less time than that inthe psychologists’ laboratory.

structure of the solution — were all preserved.With good control of the variables, thelaboratory data could cast light on the size andstructure of the problem spaces that Monod andJacob searched, and on some of the conditions ofsearch that were necessary or sufficient forsuccess.

Despite the differences between theoriginal discovery of Monod and Jacob andthat observed in the studies reported here,clear similarities exist between theconceptual processes employed by thesubjects and those employed by Jacob andMonod. ... The problem for the subjectswas to conceive of a new mechanism thatcould be applied to genetic regulation. Thatis, the subjects generated a new concept ofmutually interacting genes that regulateenzyme production by inhibition. Thesesubjects behaved just like Monod andJacob. Furthermore, just like the subjects inthe experiments reported in this article,Monod and Jacob had difficulty informulating the concept of inhibitorycontrol due to their belief in activation: Inthe 1940's, Monod began investigating theconditions under which E. coli could beinduced to produce certain enzymes.Monod hypothesized that this induction ofenzymes was an activation mechanism, or apositive process. (Dunbar, 1993, p. 431.)Planck's Law. In at least three other cases,

historically important discoveries that have beenstudied extensively by historical methods havebeen the subject of complementary experimentsin the psychology laboratory. In 1900, MaxPlanck, having published a theory of blackbodyradiation that accounted for the fit of theobserved data with Wien's Law (an exponentialfunction), learned that new data in the infraredrange showed a large departure from Wien'sLaw. The data still looked exponential in thehigher frequencies, but passed nearly linearlythrough the origin. On the evening of the sameday on which he learned about the new infrareddata, Planck revised Wien's Law into what wenow call Planck's Law (and, in the process ofproviding an explanation for the new law duringthe next several months, more or lessaccidentally discovered the quantum).

To learn more about how the first step wasaccomplished, some mathematicians andphysicists of National Academy stature wereapproached with the following question: "I havesome very smooth and noise-free data relatingtwo variables, x and y. For large values of x, yappears to be an exponential function of x; but

Page 16: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 16

for small values of x, the function passesthrough the origin and is nearly linear. Can yousuggest what function might fit these data?"(Langley et al., 1987, pp. 47-53)

Five out of the eight scientists who wereasked the question answered it, and in wellunder three minutes. The answer was alwaysessentially Planck's Law. When asked how theyreached their answer, the respondents were ableto report either (1) that they expanded theexponential into a Taylor's Series and noted thatif they subtracted unity from it, it would havethe desired properties; or (2) that they visualizedthe graph of the function, and saw it cutting they-axis at y=1, then subtracted unity from it. Inno case did they notice the relation of theirproblem and solution to Planck's Law ofblackbody radiation, although all of them werethoroughly familiar with that law.

Planck published an account of how hehimself found the function that fit the new data.His path is slightly different from those givenabove, but equally simple. Looking at hisprevious theory, he quickly found that he couldget the desired result by adding a quadratic termto a particular linear equation in the derivation.He then spent two or three months trying torationalize the change in terms of the physics ofthe situation. (His numerical law of blackbodyradiation is still accepted, but his physicalexplanation bears little relation to today'squantum mechanics.) This complementarity ofhistory with experiment greatly reduces themystery of how Planck's initial step — thechange in the key function — could be achievedby Planck in a few hours. Neither therespondents in the experiment nor Planck madeuse of the physics of the situation in taking thatstep; it was pure "numerology."

Balmer's Law. A second example is providedby Balmer's law,, a simple algebraic formula forthe frequencies of successive lines in thespectrum of hydrogen: x = kn2/(n2-4), where xis a frequency, k is a constant and n is thesequence number of the line with that frequency(n≥3). The law was discovered by Balmer, ageometry teacher who was thoroughly innocentof the physics of the problem, after seeing thevalues of the first four spectral lines (Banet,1966).

To learn more about the problem space inwhich this solution was found, a graduatestudent in engineering was hired for a summer,given the spectral data (simply as a sequence offour values of x and n), and asked to find apattern that could be extrapolated to higher

values of n. He found the equivalent of Balmer'slaw after about two months, working perhaps 10to 20 hours per week on the problem. Here, aswith Planck's Law, an important physical lawwas discovered by a pure exercise in patternfinding, without physical motivation or theory.(Thirty years later Balmer's Law served as theprincipal evidential base for Rutherford'squantum theory of the hydrogen atom. ) In bothcases, the evidence from a laboratory studycomplements the less detailed historicalevidence in revealing the nature of the discoverypath.

Faraday's Experiments. Laboratoryexperiments of quite a different kind arereplications by historians of science (notsubjects) of the historical experiments that arebeing studied. Gooding (1990) has emphasizedthat experiments don't interpret themselves.Even the process of describing laboratoryexperiments, and representing the instrumentsand the observations in language that permitsother scientists to understand and replicate themis a problem, often difficult, that has to be solvedby the investigator.

Between late August, 1831, and earlyDecember, Faraday carried out the keyexperiments that demonstrated the phenomenaof induction of currents by magnets. As hebegan to draft his first paper on this work, hediscovered that he had difficulty in interpretinghis own experiments, as recorded in his labdiary, and especially the directions of themagnetic forces and currents. He took about aweek to replicate some of his experiments and todevelop a consistent way of representing anddescribing his manipulations and findings so thathe could communicate them clearly to his fellowscientists.

Gooding (1990) has given us a careful andinsightful analysis of the same process whenFaraday, in 1821-22, published a survey articleon the status of electromagnetic research shortlyafter Oersted had made his surprising discoveryof the generation of a magnetic field by anelectric current. Gooding points out thatFaraday repeated most of the experiments hewas reviewing in order to understand, represent,and describe the manipulations and findingsunambiguously, and that he sometimessupplemented his written communications to hiscolleagues with small examples of theexperimental apparatus that enabled them torepeat the experiments themselves. And mostinstructive for our present discussion, Goodinghimself found it highly informative to repeatagain many of the experiments in order to

Page 17: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 17

understand the problems of experimentation andcommunication faced by Faraday and hiscontemporaries.

In all these cases of complementarity betweenhistorical and laboratory approaches, we see thatthe use of a historically important scientificdiscovery as the substance of the task solves theproblem of face validity for the laboratory data,and that the laboratory provides valuable newinformation about the processes of discovery at ahigh level of temporal resolution — of the orderof minutes and seconds. In combination, theseapproaches cut the Gordian Knot of face validityand external validity.

Combining History with ModelingOur next example illustrates the

complementarity of historical and model-building approaches. The path that thebiochemist, Hans Krebs, followed in hisdiscovery of the reaction path for the in vivosynthesis of urea has been the subject of a verycareful and thorough historical study by Holmes(1991), who used not only the published papersbut also the lab notebooks of Krebs and hisassistant Henseleit, and conducted interviewswith Krebs (some forty years after the discoverywas made).

Recently, two computer models of scientificdiscovery (KEKADA, by Kulkarni and Simon,1988; CDP, by Grasshoff and May, 1995) havebeen applied to modeling the urea synthesiscase. After the models proposed an experimentand were given its outcome, they then proposedanother experiment, using the knowledge ofprevious outcomes to select the search path.Both programs, using no more knowledge ofbiochemistry than Krebs possessed at the outsetof his work, succeeded in discovering thereaction path discovered by Krebs, followingfairly closely the same lines of experimentation.

The simulations showed that theexperimentation could be steered by verygeneral hypotheses (e.g., hypotheses, alreadywidely accepted, that ammonia and amino acidswere likely sources for the nitrogen in urea.), sothat experimental outcomes generally guidedtheorizing, rather than theory guidingexperimental design. The simulations alsoshowed that surprise at unexpected experimentaloutcomes could provide powerful heuristics forchoosing the next steps in search. The modelssharpened up considerably the ambiguities inchoice of strategy that Holmes had detected atvarious points in Krebs' search. The KEKADAprogram has also simulated some aspects ofFaraday's 1831 discovery of induction of

electricity by magnetism, including the effects ofthe surprise experienced at the outcome of hisfirst experiment.

There has been extensive computer modelingof other important historical discoveries, butwith less detailed comparison than in theexamples just cited between the paths ofdiscovery inferred from the historical materialsand those followed by the simulation programs.In this research, the emphasis has been uponfinding general mechanisms of discovery thatare effective over a wide range of tasks, in thesense of being able to reproduce the product, ifnot always the (largely unknown) details of theprocess, of the corresponding historicaldiscoveries.

BACON, already mentioned, is a widelyknown model of this kind which has been usedto simulate such discoveries as Boyle's law,Kepler's third law, Ohm's law, Coulomb's law,Archimedes' law, Black's laws of heat, variouslaws of chemical combination, the law ofgravitation, conservation of momentum, andSnell's law of refraction, as well as to performconcept attainment and series extrapolation tasks(Langley, et al., 1987, Part II). While BACONexists in a half dozen variant forms, all of themdepend upon a small set of heuristics that guidethe generation of mathematical functions to fitthe given data to a good approximation. HenceBACON is a heuristic search system that uses agenerate-and-test subsystem to generatehypotheses for comparison with data, thehypothesis generator being sensitive to feedbackof the results of trying to fit the previoushypotheses.

BACON's heuristics, which are strictly weakmethods, contain no information about themeaning of the data, so that the program'sdiscoveries are describable as data-driven ratherthan theory-driven. Because BACON carefullydesigns each new function it generates with theaid of information about previous fits or misfits,it typically generates only a few functions beforefinding one that fits the data. Thus BACONthrows light on those discoveries of science,especially frequent in the early years of any newfield, where little or no theory is initiallyavailable to guide experiment.

BACON is just one of a family of models ofvarious kinds of data-driven discovery. Othersinclude STAHL, GLAUBER, and DALTON(Langley et al., Part III). A characteristic of thisline of research is that it begins to fill in thelarge lacuna in the literature of scientificmethodology, which has tended to neglect data-driven discovery, observation and exploratory

Page 18: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 18

experiments, and has paid attention almostexclusively to controlled experiments as a meansfor testing theories that have already been fullyformulated. An important product of themodeling has been to complement historicalstudies where the science is in large measuredata-driven, but where historical data are notsufficiently fine-grained to show howobservations of phenomena can guideexperimentation (e.g., the studies of the work ofKrebs and Faraday) .

Page 19: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 19

History, Lab, and Model One example can be cited where historical

data, a model, and a laboratory experiment haveall been used to provide complementaryanalyses of the same historically importantdiscovery: Kepler's discovery of his Third Lawof Planetary Motion, which states that theperiods of revolution of the planets about theSun vary with the 3/2 power of their meandistances from the Sun. The historical record ofthis discovery is very sparse (Gingerich, 1975),consisting largely of Kepler's own publishedaccounts, first of his "discovery" of an erroneouslaw (that the periods vary with the squares of thedistances); then of his discovery of the correctlaw two decades later.

From the history we know some of thedifficulties he encountered (especially errors inarithmetic), and the distractions of his life whenthe problem lay fallow. We know that he did nothave logarithms available at the time he foundthe law. We are almost wholly lacking fine-grained temporal data on the stages of discovery.There was little physical motivation for the law,although Kepler held some notions about theSun as the source of force that produced therevolutions, and these notions may have pointedhim to the square law — certainly not to theD3/2 power law. But the central question is:what led him, with only a few weeks' activesearch, to this particular mathematical function— P = D3/2 — out of all the functions he mighthave generated? His problem space of possiblefunctions was enormous, and that he found thisparticular one relatively quickly calls forexplanation.

When BACON is given the same data thatKepler had, and nothing more, it finds thecorrect law as the third or fourth function that itgenerates, the exact order depending on theprecise heuristics it uses. In either case,BACON's heuristics guide it almost directly tothe answer after trying no more than one or twoinadequate alternatives. What is perhaps moreremarkable is that the second function it tries isthe quadratic — the false answer that Keplerproposed in his initial publication on the law.Given the sketchy historical evidence, BACONdoes a remarkable job of tracing the originaldiscovery path. Whether it followed that path forthe same reasons that Kepler did cannot beanswered with the data that are available.

To complement further the data availablefrom history and from the BACON simulation,

Qin and Simon (1990) conducted tests withlaboratory participants, giving them Kepler'sdata (the variables identified only as x and y)and asking them to find a function that fit thedata. Of fourteen college students, four foundKepler's Third Law in an hour or less; the otherten failed. Of those who failed, four had weakmathematical backgrounds, and generatedscarcely any functions but straight lines. Theremaining six generated a variety of functions(not including the right one), but showed noevidence that the results from an unsuccessfulattempt to fit a function had any influence onwhat function they chose to test next. On theother hand, the four participants who found thelaw all used selective heuristics to choose thenext function for testing according to the natureof the misfits of those previously tried. Theirheuristics had a close resemblance to those ofBACON (with which they were not familiar).

The values of experimentation and modelingin providing evidence about feasible and likelysearch paths in the absence of much historicaldata to identify the path actually followed tomake the original discovery are well illustratedby this example.

Laboratory studies: exploratory and controlledIn some cases, the two types of laboratory

studies -- exploratory and controlled -- can befocused on a common problem. In Klahr &Dunbar’s (1988) original study of BigTrak, therewere no control conditions. Instead, the projectwas conceived as an exploration of what wouldhappen when participants were presented with arelatively complex (for a lab study) discoverytask. The results of this study, as notedelsewhere, led to the discovery of two distinctstrategies for approaching the discovery problem(i.e., the Theorist - Experimenter distinction).This exploratory study was then followed upwith very carefully designed factorialexperiments that controlled for the age andscientific background of participants andintroduced different levels of plausibility of thesuggested hypothesis that participants wereasked to explore (Klahr, Fay, & Dunbar, 1993).This was followed by a series of investigations(reported in Klahr, forthcoming) that alternatedbetween exploratory and controlled laboratoryexperiments, yielding a number of interestingfindings on the development of scientificreasoning processes.

Page 20: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 20

Convergent Evidence of Principles ofDiscovery

In the previous section, we focused on thecomplementarities of the different approaches.In this section we give several examples of thekind of convergent evidence obtained by usingtwo or more approaches to study the samescientific discovery. We will provide a fewexamples of how basic processes of discoveryrevealed in one situation can be tested andgeneralized to other situations. Suchcomparison takes us from the limitations ofindividual case studies to the construction andtesting of general theories.

SurpriseIn this century, the reigning theories of

philosophy of science have generally takenhypotheses as uncaused (or at least unexplained)causes that lead to experiments designed to testthem. In this view, the hypotheses themselvesderive from scientists' "intuitions" and arebeyond scientific explanation. (Popper, 1959).The history of science has taken a much lessrigid position with respect to hypotheses, andhas included the question of their origins withinthe scope of its interests and methods.

For example, historical accounts of thediscovery of radium by the Curies usually startwith their project to obtain pure radioactiveuranium from pitchblende, in order to study thebehavior and properties of uranium. They werefamiliar with the level of radioactivity ofuranium, and as they proceeded to process thepitchblende, they were surprised to find that thelevel of radioactivity began to exceed that ofpure uranium. A surprise calls for anexplanation, and the explanation that occurred tothem was that the pitchblende contained asecond substance (which they named radium)that was more radioactive than uranium. The testof this hypothesis consisted in extracting thissubstance, separating it from both thepitchblende and the uranium, and determiningsome of its key properties.

In this case a phenomenon led to ahypotheses, rather than a hypothesis toexperimental phenomena. This is not a singularcase in scientific history, but a frequentoccurrence. Often, as was true in this instance, itis accompanied by surprise; that is, the observedphenomena was unexpected and unpredictablefrom the knowledge the scientists already held.

A surprise can only occur whenexpectations that have been formed are violated.In observational studies and exploratory

experimentation, phenomena are, from time totime, recognized as conflicting with previouslystored knowledge and expectations about theproblem domain. In the face of surprise,scientists frequently divert the path ofexploration to ascertain the scope and import ofthe surprising phenomenon and to determine itsmechanism. (See Darden, 1992 for an analysisof responses to anomalies based on the historicalrecord, and Chinn & Brewer, 1998, for alaboratory investigation of how people respondto anomalous data).

We have mentioned Fleming's response tosurprise in the case of penicillin, and we haveseen that the KEKADA discovery model,already discussed in connection with modelingthe research of Krebs and Faraday, addresses thesurprise issue directly. When performing anexperiment, the scientist simulated by KEKADAforms expectations, based on previousexperience, about outcomes. When the actualoutcomes violate the expectations, the scientistis surprised and, in the KEKADA theory, takessteps to explain the surprising phenomenon.These steps may include steps to discover thescope and generality of the phenomenon andthen steps to discover its mechanism.

The KEKADA model permits an examinationof surprise as it arises in different experimentalenvironments, allowing a rigorous statement of ageneral theory of the role of surprise indiscovery and of its mechanisms. In the case ofKrebs, it shows how an unexpected large yieldof urea in the presence of a particular aminoacid, ornithine, led Krebs to discover thecatalytic role of ornithine in the production ofurea from ammonia. Similarly, surprise atobtaining a transient flux of electricity in acircuit when a nearby magnet was activated ledFaraday, through a long series of experimentsaimed at understanding the mechanism of thesurprising phenomenon, to discover the meansof using magnets to produce continuous electriccurrents.

In a laboratory study in which participantshad to discover the function of an unknown keyon a simulated “rocket ship”, Klahr, Fay, &Dunbar (1993) investigated the effects ofsurprise by providing participants withsuggested hypotheses about how the keyworked. These hypotheses were designed to beeither highly plausible or highly implausible.Adults and children had quite different reactionsto implausible hypotheses. Adults usuallyproposed a competing hypothesis and thengenerated experiments that could distinguishbetween them. On the other hand, young

Page 21: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 21

children (3rd graders) tended to dismiss animplausible hypothesis and ignore evidence thatit might be correct. Instead, they adopted a kindof engineering mode in which they attempted todemonstrate that their favored hypothesis wascorrect by showing that (under somecircumstances) they could control the behaviorof the device. These results imply that animportant aspect of the development of scientificthinking is coming to accept, rather than deny,surprising results, and to explore further thephenomenon that gave rise to them.

Here we see historical studies, simulationmodels, and laboratory experiments allconverging on a particular set of phenomena —in this case, reaction to surprise — therebygenerating and testing new theories to describeand explain them. Observational studies can alsocontribute to this convergence if good fortuneleads to the observation of an "aha" event or asurprise.

The BACON model, which we havementioned several times, was built to test theveridicality and usefulness of the hypothesisthat, especially in new fields of science, wheretheory is poorly developed or absent,observation typically precedes hypothesisconstruction: that the first step in progress is tofind patterns (laws) in data. We have alreadydescribed how BACON finds Kepler's third lawon its third or fourth try, and performs withsimilar efficiency in a large number of othertasks. Here we see the emergence of a generaltheory of data-driven science based on heuristicsearch by a hypothesis generator of the space ofpossible problem solutions guided by feedback(knowledge of results) to the generator ofsuccessive hypotheses. The hypotheses are acombined product of the internal structure of thegenerator and of the observed data.

In order to test the range of its applicability asa theory of discovery, BACON, like KEKADA,can be applied to the findings of historical,laboratory, or observational studies ofdiscoveries, and the acuteness of the test of itsveridicality is only limited by the completenessand temporal resolution of the data that areavailable.

The Role of Analogy and Recognition Although philosophers have long been

interested in the role of analogy in science(Duhem, 1914; Hesse, 1966), it is only In thepast 25 years that analogy has assumedprominence in theories of problem solving andscientific discovery and that its underlyingcognitive mechanisms have been studied in

detail (Darden, 1980; Gentner, 1982). Holyoak& Thagard (1995) provide several examples ofanalogical problem solving in major scientificdiscoveries, ranging from a first-century analogybetween sound and water waves to Turing'smind/computer analogy. As we argued earlier,analogy can be viewed as a complex form ofrecognition. Holyoak & Thagard also emphasizethe role of analogical thinking in cognitivedevelopment (see Goswami, 1996, for anextensive review) and Nersessian (1984)documents its role in several of the majorscientific discoveries of the 19th century.Analogical reasoning also plays a central role inrecent analysis of the thinking processes ofcontemporary scientists working in their labs(Dunbar, 1994, Thagard, 1997, Ueda, 1997), andit is often the method of choice for formulatinginitial hypotheses and experiments in a varietyof discovery contexts.

Multiple Search SpacesThe reciprocal relation between hypotheses

and phenomena that we have just observed hasbeen noticed in a number of different approachesto scientific discovery, including laboratorystudies, historical studies, and computationalmodels of discovery. A problem-solvingorientation enables us to use a common languagein describing all of these in terms of search inmultiple spaces. We begin this discussion with acharacterization that includes only two distinctspaces, and then we expand the number ofspaces as required.

The two-space view was first proposed byKlahr & Dunbar (1988) in order to account forthe results of their laboratory study in whichparticipants had to discover the functionality of aparticular control button on a programmable toyvehicle. Klahr & Dunbar found that participantssometimes searched for experimentalmanipulations that would provide newinformation about the button's functions, andsometimes searched for rules that explaineddevice’s behavior in response to themanipulations. Noting that Simon & Lea(1974) had proposed that human conceptformation employs searches in separate instanceand hypothesis spaces, Klahr & Dunbarextended the “dual search” notion to the domainof scientific discovery, where one has tocoordinate search in two spaces: a space ofexperiments and a space of hypotheses.

Search in the Hypothesis Space. Generatingnew hypotheses is a type of problem solving inwhich the initial state consists of someknowledge about a domain, and the goal state is

Page 22: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 22

a hypothesis that can account for some or all ofthat knowledge in a more concise or universalform. Once generated, hypotheses are evaluatedfor their initial plausibility. Expertise plays arole here, as participants' familiarity with adomain tends to give them strong biases aboutthe plausibility of hypothesis. Plausibility, inturn, affects the order in which hypotheses areevaluated: highly likely hypotheses tend to betested before unlikely hypotheses (Klayman &Ha, 1987; Wason, 1968). Furthermore,participants may adopt different experimentalstrategies for evaluating plausible andimplausible hypotheses.

Search in the Experiment Space. Hypothesesare both generated from and evaluated throughexperimentation. But it is not immediatelyobvious what constitutes a "good" or"informative" experiment. In constructingexperiments, scientists are faced with a problem-solving task paralleling their search forhypotheses. However, in this case search is in aspace of experiments rather than a space ofhypotheses. If experiments are used to generatenew information, then they should be designedto maximize the likelihood that they will revealsomething of interest. If they are being used totest hypotheses, they should discriminate amongrival hypotheses. Both uses of experimentaloutcomes involve search in a space ofexperiments that is only partially defined at theoutset. Constraints on the search must be addedduring the problem solving process.

The dual search notion can be used toillustrate the convergence of several types ofinvestigations of scientific discovery. In theirlaboratory studies, Klahr & Dunbar found thatsome participants ("experimenters") focused onsearching the space of possible manipulations,whereas other participants ("theorists") focusedon the space of possible explanations of theresponses. Similar differences in preferencebetween experiment-driven and theory-drivenstrategies have been noticed in other laboratorystudies (Okada, 1994; Okada & Simon, 1995).Studies based on historical approaches can beinterpreted in terms of the balance betweenhypothesis-space search and experiment-spacesearch. For example, in most histories ofFaraday's discovery of induction of electricity bymagnets, much emphasis has been placed on theinfluence of Ampère's theory of magnetism uponFaraday's thought, but a strong case can be made(Gooding, 1990) that Faraday's primary searchstrategy was to focus on experiment spacesearch, yielding a discovery path that was drivenlargely by phenomena rather than theory.

The dual space characterization revealsanother type of convergence by allowing us tocategorize computational models of discoveryaccording to which space they emphasize. Somefocus mainly on search in the hypothesis space:e.g., the BACON models & variants (Langley,Simon, Bradshaw, & Zytkow, 1987), IDS(Nordhausen & Langley, 1993), PHINEAS(Falkenhainer, 1990), COPER (Kokar, 1986),MECHEM (Valdés-Pérez, in press), HYPGENE(Karp, 1990), AbE (O’Rorke, Morris, &Schulenburg, 1990), OCCAM (Pizzani, 1990),and ECHO (Thagard, 1988). Othercomputational models focus mainly on theprocess of experiment generation andevaluation: e.g., DEED (Rajamoney, 1993), andDIDO (Scott & Markovitch, 1993). A few dealwith both processes: e.g., KEKADA (Kulkarni& Simon, 1988), STERN (Cheng, 1990), HDD(Reimann, 1990), IE (Shrager, 1985), and LIVE(Shen, 1993). This categorization mightprovide a starting point for the integration ofthese different models into a very richcomputational implementation of the dual searchframework.

Beyond two spaces. Although the two-spacemodel was adequate to capture most of thebehavior of participants in the Klahr & Dunbarwork, analysis of participants’ behavior in amore complex “microworld” laboratory(Schunn, 1995) necessitated the expansion froma two-space to a four-space model, depicted inFigure 1. In this model, the hypothesis space hasbeen expanded to included both a datarepresentation space and a hypothesis space. Inthe data representation space, representations orabstractions of the data are chosen from the setof possible features. What people search for inthis space is an effective and informative way torepresent the phenomena they are observing.Additional support for the importance of arepresentation space comes from Cheng &Simon’s (1992) analysis of Gallileo’s research inwhich they compare the relative difficulty ofmathematical and diagrammatic representations.Here, as in many areas of science, finding theright representation is crucial and it requiresheuristic search -- with all of its associated weakmethods -- in a large space of possibilities.

Figure 1 also shows how the experimentspace is now divided into an experimentalparadigm space and an experiment space. In theexperimental paradigm space, a class ofexperiments (i.e., a paradigm) is chosen whichidentifies the factors to vary, and thecomponents which are held constant. In theexperiment space, the parameters settings withinthe selected paradigm are chosen.

Page 23: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 23

It should be clear that there is no “right”number of spaces4, because that is entirelydependent on the nature of the discovery context(Langley, et al., 1987). In his analysis of thediscovery of the bacterial origins of stomachulcers, Thagard (forthcoming a,b) demonstratesthe importance of search in at least three majorspaces: hypothesis space, experiment space, anda space of instrumentation.

Figure 1 A four-space model of scientificdiscovery. The arrows indicate the direction ofinformation flow among the four spaces. (FromSchunn & Klahr, 1995)

Search in the space of representations. Onemethod for searching the representation spacethat has attracted considerable attention isanalogy. A prominent example is Bohr's use ofthe solar system analogy in arriving at hisquantum model of the hydrogen atom. Heviewed the planetary electrons as orbiting thenucleus, ignoring the fact that, according toclassical physics, the charged electrons wouldproduce a magnetic field, thereby dissipatingenergy until they fell into the nucleus. Insteadof abandoning the analogy, he borrowedPlanck's quantum, which allowed energy to bedissipated only in leaps of quantum size, thenshowed that these leaps would produce a lightspectrum corresponding exactly to the Balmer'sseries of the hydrogen spectrum, introducedthirty years previously. Not an analogy somuch as a thoroughly mixed metaphor, onemight say, but a successful one; though itrequired much tinkering and radical additionalrepresentation changes (ultimately,Schrödinger's wave equations and Heisenberg's

4. Even though some scientists -- present companyincluded -- have been drawn into debates aboutwhether the magic number is two (Klahr & Dunbar,1988; van Joolingen, W. R., & de Jong, T., 1997),three (Baker & Dunbar, 1996; Burns & Vollmeyer,1997), four (Schunn & Klahr, 1995, 1996) or N(Wolf & Beskin, 1996).

matrix mechanics) before it could be extendedsystematically beyond hydrogen and ionizedhelium to the other elements.

In the autumn and early winter of 1831-32,when he was making his fundamentaldiscoveries of induction of electric currents bymagnetism, Faraday went through a wholeseries of representations of the phenomena hewas seeing which are revealed by his diary.Initially he visualized the sudden energizing of amagnet as creating a current in any nearbyclosed metallic circuit, but this state beingimmediately terminated by the creation of an"electrotonic state" in the circuit which opposedthe flow of current. He tried hard, butunsuccessfully, to find independent experimentalevidence for the electrotonic state. Then henoticed that if a magnet moved continually inthe neighborhood of the circuit, a continuouscurrent would be produced -- the phenomenoncould be represented in terms of relativemovement. Next, he visualized the lines ofmagnetic force (which he had long been familiarwith as revealed by the arrangement of ironfilings around a magnet), and theorized that thecurrent was induced when the relativemovement of circuit and magnet caused theformer to cut the lines of magnet force aroundthe latter. In Faraday's case, it was not ananalogy that led to this final representation, but aseries of phenomena observed in the course of along series of experiments (most of them notpredicted before the experiments were run)combined with the experience of actually"seeing" the lines of force.

In chemistry at the end of the 18th Century, amajor event was the rather rapid shift from thephlogiston theory of combustion to the oxygentheory, in which a change in representation ofthe standard experiments played a major role.In earlier experiments on combustion, the mainphenomena observed were (a) the solid materialsof combustion and their residues, and (b) theheat, flame, and smoke produced during thecombustion process. The latter provided thebasis for hypothesizing a "phlogiston," whichwas supposed to be driven out of materials in thecourse of combustion. New methods ofobserving and measuring the volumes andpressures of gases showed that large quantitiesof gasses were absorbed, and other gasesproduced, during combustion in air. Forexample, oxygen was often absorbed and carbondioxide given off. A focus of attention on thegases instead of the heat and flame changed therepresentation of the combustion process andproduced the new oxygen theory.

Page 24: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 24

From these examples we see thatrepresentations can derive from many sources:analogies, phenomena produced by, but notpredicted by, experiment, and even new ways of"seeing" experiments triggered by newinstruments of observation.

Search in the strategy space. Finally,changes in strategy, even while a fixed problemrepresentation is maintained, may play animportant role in discovery. Often the changein strategy results from, or leads to, theinvention of new scientific instruments orprocedures. Breeding experiments are a tool ofgenetics research that goes back to GregorMendel (and for the applied genetics ofagriculture, many centuries further back thanthat). The productivity of such experimentsdepended on the rates at which mutationsoccurred. Müller, with the "simple" idea that x-rays could induce higher rates of mutation,substantially improved that productivity.

A number of issues regarding the relativeefficacies of different strategies for research onscientific discovery have been discussed in thispaper: strategies of using observational studiesand exploratory experiments versus controlledexperiments, or choices among historicalstudies, laboratory experiments andobservations. These same kinds of choicesmust be made in the other domains of scientificresearch.

Relation Between Science Studies and MoreGeneral Studies of Creativity and ProblemSolving

We come now to our final generalization: thehypothesis that the theory of scientific discoveryis a special case of the general theory of problemsolving, the special features being supplied bythe strong methods of each discipline and theknowledge and procedures that support them,while the ubiquitous weak methods supply thecommonalities. In our exploration of scientificdiscovery, we have seen that (1) it is based uponheuristic search in a set of problem spaces:spaces of instances, of hypotheses, ofrepresentations, of strategies, of instruments, andperhaps others; (2) the control structures forsearch are such general mechanisms as trial-and-error, hill-climbing, means-ends-analysis, andresponse to surprise; (3) recognition processes,evoked by familiar patterns recognized inphenomena, evoke knowledge and strongmethods from memory, thereby linking the weakmethods to the mechanisms that are domain-specific.

All of the constructs and processes mentionedabove are also the constructs and processes thatare encountered in problem solving in all thedomains in which it has been studied. A painteris not a scientist. Nor is a scientist a lawyer or abusinessman, or a machinist, or a cook. Butthey share the same general approach to solvingtheir respective problems, and use the sameweak methods. When their problem solvingactivity is described at the level we have beenusing in this paper, each can understand therationale of the expert's activity, howeverabstruse and arcane the content of their specialexpertise may appear.

If we press to the boundaries of creativity, themain difference we see from more mundaneexamples of problem solving is that theproblems become less well structured,recognition becomes less powerful in evokingpre-learned solutions or powerful domain-specific search heuristics, and more, not less,reliance has to be placed on weak methods.The more creative the problem solving, the moreprimitive the tools. Perhaps this is why"childlike" characteristics, such as thepropensity to wonder, are so often attributed tocreative scientists and artists.

Page 25: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 25

REFERENCESBaker, L. M., & Dunbar, K. (1996). Problem spacesin real-world science: What are they and how doscientists seach them? In G. W. Cottrell (Ed.),Proceedings of the Eighteenth Annual Conference ofthe Cognitive Science Society (pp. 21-22). Mahwah,NJ: Lawrence Erlbaum Associates.Banet, L. (1966). Evolution of the Balmer Series.American Journal of Physics, 34 , 496-503.

Barlow, D. H. & Hersen, M. (1984) Single caseexperimental designs: Strategies for studyingbehavior change (2nd ed.) New York: Pergamon.

Bazerman, C. (1988) Shaping written knowledge :the genre and activity of the experimental article inscience Madison, WI: University of WisconsinPress.Bijker, W. E., Hughes, T. P., & Pinch, T. (1987). Thesocial construction of technological systems: Newdirections in the sociology and history of technology .Cambridge, MA: MIT Press.Bloor, D. (1981) Knowledge and social imagery .London: Routledge and Kegan Paul.Boden, M. A. (1990). The creative mind: Myths andmechanisms . London: Basic Books.

Boden, M. A. (1994). Precis of the creative mind:Myths and mechanisms. Behavioral and BrainSciences , 17 , 519-570.

Brewer, W. F., & Samarapungavan, A. (1991). Childtheories versus scientific theories: Differences inreasoning or differences in knowledge? In R. R.Horfman & D. S. Palermo (Eds.), Cognition andrhyme symbol processes: Applied and ecologicalperspectives , Hillsdale, NJ: Erlbaum.

Bruner, J. S., Goodnow, J. J., & Austin, G.A. (1956).A study of thinking . New York: NY ScienceEditions.Burns, B. D. & Vollmeyer, R. (1997) A Three-spaceTheory of Problem Solving. In M. G. Shafto & P.Langley (Eds.) Proceedings of Nineteenth AnnualMeetings of Cognitive Science Society. Mahwah, NJ:ErlbaumCallahan, J. D. & Sorensen, S. W. (1992) UsingTETRAD II as an Automated exploratory tool.Social Science Computer Review, 10, (3), 329 -336.Cheng, P. C.-H. (1990). Modeling scientificdiscovery. Unpublished doctoral dissertation, TheOpen University, Milton Keynes.

Cheng, P. C-H., & Simon, H. A. (1992). The rightrepresentation for discovery: finding the conservationof momentum. In D. Sleeman & P. Edwards (Eds.),Machine Learning: Proceedings of the NinthInternational Conference (ML92)

Chinn, C. A., & Brewer, W. F. (1998) An empiricaltest of a taxonomy of responses to anomalous data inscience. Journal of Research in Science Teaching , 35,6, 623-654.Cook, T. D. & Campbell, D. T. (1979) Quasi-experimentation: Design and analysis issues for fieldstudies. Chicago:Rand McNally.Crick, F. (1988). What mad pursuit: A personal viewof scientific discovery . New York: Basic Books.

Darden, L. (1980) Theory construction in genetics.in T. Nickles (Ed.) Scientific Discovery: CaseStudies . Boston, MA: D. Reidel PublishingCompany. p. 151-170.Darden, L. (1992) Strategies for AnomalyResolution. In R. N. Giere (Ed.) Cognitive models ofscience. Minnesota studies in the philosophy ofscience (Vol. 15). Minneapolis: University ofMinnesota Press. p. 251-271.Darden, L. (1997) Recent work in computationalscientific discovery. In M. G. Shafto & P. Langley(Eds.) Proceedings of Nineteenth Annual Meetings ofCognitive Science Society. Mahwah, NJ: Erlbaum

Darden, L. & Cook, M. (1995) Reasoning strategiesin molecular biology: Abstractions, scans andanomalies. Philosophy of Science , 1994, vol. 2, 179-191. Philosophy of Science Association.Duhem, P. (1954/1914) The aim and structure ofphysical theory. Part I, Chapter, IV . Princeton:Princeton University Press. (First published as LaTheorie physique , Paris, 1914).

Dunbar, K. (1993). Concept discovery in a scientificdomain. Cognitive Science , 17 , 397-434.

Dunbar, K. (1994). How scientists really reason:Scientific reasoning in real-world laboratories. In R.J.Sternberg & J. Davidson (Eds.), Mechanisms ofInsight . Cambridge, MA: MIT Press.

Dunbar, K. (1997) How scientists think: On-linecreativity and conceptual change in science. In T.Ward, S. Smith, & S. Vaid (Eds.) Conceptualstructures and processes: Emergence, Discovery andChange. APA Press. Washington DC. pp. 461-492.

Dunbar, K., & Baker, L. A. (1994). Goals, analogy,and the social constraints of scientific discovery.Behavioral and Brain Sciences , 17 , 538-539.

Page 26: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 26

Duncan, S.C. & Tweney, R.D. (1997). The Problem-Behavior Map as cognitive-historical analysis: Theexample of Michael Faraday. Proceedings of theNineteenth Annual Conference of the CognitiveScience Society , p. 901. Hillsdale, NJ: LawrenceErlbaum Associates.Dunker, K. (1945). On problem solving,Psychological Monographs, 58 (5). (270).

Einstein, (1936). Physics & Reality. Reprinted in A.Einstein (1950) Out of my later years. New York:Philosophical LibraryEinstein, A. (1950) Out of my later years . New York:Philosophical LibraryErdos, P., Fajtlowicz, S., & Staton, W. (1991).Degree sequences in the triangle-free graphs,Discrete Mathematics , 92 (91), 85-88.

Ericsson, K. A. & Simon, H. A. (1993) Protocolanalysis : verbal reports as data Rev. ed.Cambridge, Mass. : MIT PressFalkenhainer, B. (1990). A unified approach toexplanation and theory formation. In J. Shrager & P.Langley (Eds.), Computational models of scientificdiscovery and theory formation (pp. 157-196). SanMateo, CA: Morgan Kaufman.Fay, A. & Klahr, D. (1996) Knowing about guessingand guessing about knowing: Preschoolers'understanding of indeterminacy . Child Development, 67, 689-716.Feist, G.J. (1991). The psychology of science:Personality, cognitive, motivational and workingstyles of eminent and less eminent scientists.Unpublished dissertation, University of California,Berkeley.Feist, G.J. (1994). Personality and working stylepredictors of integrative complexity: A study ofscientists' thinking about research and teaching.Journal of Personality and Social Psychology , 67,474-484.Feist & Gorman (1998) The psychology of science:Review and integration of a Nascent Discipline.Review of General Psychology , 2,1, 3-47.

Finke, R. A., Ward, T. B., & Smith, S. M. CreativeCognition , Cambridge, MA: MIT Press.

Galison, P. (1987). How experiments end . Chicago:University of Chicago Press.Gentner, D. (1982). Are scientific analogiesmetaphors? In D. S. Miall (Ed.), Metaphor:Problems and Perspectives , pp. 106-132. Brighton,UK: Harvester.

Gentner,D. & Jeziorski (1989) Historical shifts in theuse of analogy in science.In B. Gholson, W. Shadish,R. Beimeyer, & A. Houts (Eds.) The psychology ofscience: Contributions to metascience. 296-325.Cambridge: Cambridge University Press.Giere, R. N. (1988). Explaining Science: A CognitiveApproach . Chicago, IL: University of Chicago Press.

Gingerich, O. (1975). The origins of Kepler's ThirdLaw. In A. Beer and P. Beer (eds.), Kepler: FourHundred Years. Oxford, UK: Pergamon.

Gooding, D. (1990). Experiment and The making ofmeaning . Dorcrecht: Nijhoff/Kluwer.

Gorman, M. E. (1992). Simulating science:Heuristics, mental models, and technoscientificthinking . Bloomington, IN: Indiana University Press.

Goswami, U. (1996). Analogical reasoning andcognitive development. Advances in ChildDevelopment and Behavior , 26 , 91-139.

Grasshoff, G. & May, M. (1995). From historicalcase studies to systematic methods of discovery.Working notes: AAAI Spring Symposium onSystematic Methods of Scientific Discovery. Stanford CA: AAAI.Gross, P. R., & Levitt, N. (1994). Higher superstition:The academic left and its quarrels with science .Baltimore: Johns Hopkins University Press.Gruber, H. E. (1974) Darwin on man: Apsychological study of Scientific creativity. NewYork, E. P. Dutton.Hadamard, J. (1945) The psychology of invention inthe mathematical field. New York: Dover

Halford, G. S. (1992) Analogical reasoning andconceptual complexity in cognitive development.Human Development 35: 193-217.

Hanson, N. R. (1958). Patterns of discovery. Cambridge, UK: Cambridge University Press.Hendrickson, J. B., & Sander, T. (1995). COGNOS:A Beilstein-type system for organizing organicreactions. Journal of Chemical Information andComputer Sciences. 35, 2, 251.

Hesse, M. B. Models and analogies in science. NotreDame, Indiana: University of Notre Dame Press.Holmes, F. L. (1985). Lavoisier and the chemistry oflife: An exploration of scientific creativity . Madison:University of Wisconsin Press.

Page 27: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 27

Holmes, F. L. (1991). Hans Krebs: The Formationof a Scientific Life, 1900-1933, Volume 1. NewYork, NY: Oxford University Press.Holyoak, K. J., & Thagard, P. (1995). Mental leaps:Analogy in creative thought . Cambridge, MA: MITPress.Hovland, C. & Hunt, E. (1960) The computersimulation of concept attainment. BehavioralScience . 5, 265-267

Ippolito, M.F. & Tweney, R.D. (1995). The inceptionof insight. In R.J. Sternberg & J.E. Davidson (eds.)The nature of insight. (pp. 433-462). Cambridge,MA: The MIT Press.Jacob, F.(1988) The statue within . New York: BasicBooks.Jacob, F. & Monod, J. (1961) Genetic regulatorymechanisms in the synthesis of proteins. Journal ofMolecular Biology, 3 , 318-356.

Judson, H. F. (1979) The eighth day of creation .New York: Simon & Shuster.Judson, H. F. (1996) The eighth day of creation :Makers of the Revolution in Biology. ExpandedEdition. Plainview, New York: Cold SpringHarbour Laboratory Press.Karmiloff-Smith, A. (1988). A child is a theoretician,not an inductivist. Mind and Language , 3 , 183-195.

Karp, P. (1990). Hypothesis formation as design. In J.Shrager & P. Langley (Eds.), Computational modelsof scientific discovery and theory formation. SanMateo, CA: Morgan Kaufmann.Kepler, J. (1937ff). Gesammelte Werke. W. vonDyck, M. Caspar, F. Hammer, et al. (eds.). Munich,Kepler, J. (1596). Mysterium cosmographicum . InGesammelte Werke, Vol. I.

Kepler, J. (1619). Harmonice mundi. In GesammelteWerke, Vol VI.

Kern, L. H., Mirels, H. L, & Hinshaw, V. G. (1983).Scientists' understanding of propositional logic: Anexperimental investigation. Social Studies of Science ,13 , 131-146.

Klahr, D. (in press) Exploring Science: the Cognitionand Development of Discovery Processes. Cambridge, MA: MIT Press.Klahr, D., & Dunbar, K. (1988). Dual space searchduring scientific reasoning. Cognitive Science , 12 , 1-55.

Klahr, D., Fay, A. L., & Dunbar, K. (1993).Heuristics for scientific experimentation: Adevelopmental study. Cognitive Psychology , 24 (1),111-146.Klayman, J., & Ha, Y. (1987). Confirmation,disconfirmation and information in hypothesistesting. Psychological Review , 94 , 211-228.

Kokar, M. M. (1986). Determining arguments ofinvariant functional descriptions. Machine Learning ,1(4), 403-422.Kratochwill, T.R. (Ed.) (1978) Single-subjectresearch: Strategies for evaluating change . NewYork, Academic Press.Kuhn, D. (1989). Children and adults as intuitivescientists. Psychological Review , 96 , 674-689.

Kuhn, D., Amsel, E., & O'Loughlin, M. (1988). Thedevelopment of scientific reasoning skills . Orlando,FL: Academic Press.Kuhn, D., Garcia-Mila, M., Zohar, A., Andersen, C.(1995) Strategies of Knowledge Acquisition.Monographs of the Society for Research in ChildDevelopment. #245, 60, no. 3.

Kulkarni, D. & Simon, H.A. (1988). The process ofScientific Discovery: The strategy ofExperimentation. Cognitive Science , 12, 139-176.

Kulkarni, D., & Simon, H. A. (1990).Experimentation in machine discovery. In J. Shrager& P. Langley (Eds.), Computational models ofscientific discovery and theory formation . San Mateo,CA: Morgan Kaufmann.Langley, P., Simon, H. A., Bradshaw, G. L., &Zytkow, J. M. (1987). Scientific discovery:Computational explorations of the creative processes .Cambridge, MA: MIT Press.Latour, B. & Woolgar, S. (1986) Laboratory Life:The Construction of Scientific Facts. Princeton, NJ:Princeton University Press.Laudan, L. (1977) Progress and its problems .Berkeley: University of California Press.Le Grand, H. E. (Ed.) (1990) Experimental Inquiries:Historical, Philosophical and Social Studies ofExperimentation in Science . Boston: KluwerAcademic Publishers.Luchins, A. S. (1942) Mechanization in problemsolving. Psychologcial Monographs, 54: 6, WholeNo., 248.Mahoney, M.J. (1979). Psychology of the scientist:An evaluative review. Social Studies of Science, 9,349 - 375

Page 28: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 28

Mitroff, L. L. (1974). The subjective side of science .New York: Elsevier.Mynatt, C. R., Doherty, M. E., & Tweney, R. D.(1977). Confirmation bias in a simulated researchenvironment: An experimental study of scientificinference. Quarterly Journal of ExperimentalPsychology , 29 , 85-95.

Mynatt, C. R., Doherty, M. E., & Tweney, R. D.(1978). Consequences of confirmation anddisconfirmation in a simulated research environment.Quarterly Journal of Experimental Psychology , 30 ,395-406.Nersessian, N. J. (1984) Faraday to Einstein:Constructing Meaning in Scientific Theories .Dordrecht: Martinus Nijhoff.Nersessian, N. J. (1992) How do scientists think?Capturing the dynamics of conceptual change inscience. In R. N. Giere (Ed.) Cognitive models ofscience. Minnesota studies in the philosophy ofscience (Vol. 15). Minneapolis: University ofMinnesota Press. p. 3 - 44. Newell, A., & Simon, H. A. (1972). Human problemsolving . Englewood Cliffs, NJ: Prentice-Hall .

Nordhausen, B., & Langley, P. (1993). An integratedframework for empirical discovery. MachineLearning , 12, 17-47.

O’Rorke, P., Morris, S., & Schulenburg, D. (1990).Theory formation by abduction: A case study basedon the chemical revolution. In J. Shrager & P.Langley (Eds.), Computational models of scientificdiscovery and theory formation. San Mateo, CA:Morgan Kaufmann.Okada, T. (1994). Collaborative scientific discoveryprocesses . Unpublished doctoral dissertation.Carnegie Mellon University, Pittsburgh, PA.Okada, T., & Simon, H.A. (1997). Collaborativediscovery in a scientific domain. Cognitive Science ,21, 109-146.Parvanno, C. (1990). quoted in Hands On, by J.Elder, in Education Life, NYT, special supplement,January 7.Penner, D. E., & Klahr, D. (1996). When to trust thedata: Further investigations of system error in ascientific reasoning task. Memory & Cognition, 24(5), 655-668.Perkins, D. N. (1981). The mind's best work .Cambridge, MA: Harvard University Press.Pickering, A. (Ed.) (1992) Science as Practice andCulture. Chicago, IL: University of Chicago Press.

Pizzani, M. J. (1990). Creating a memory of causalrelationships. Hillsdale, NJ: Erlbaum.

Poincare’, H. (1929) The foundation of science . NewYork: The Science Press.Popper, K. R. (1959). The logic of scientificdiscovery . London: Hutchinson.

Qin, Y. & Simon, H.A. (1990) Laboratory replicationof scientific discovery processes Cognitive Science, 14 : 281-312.Rajamoney, S. A. (1993). The design ofdiscrimination experiments. Machine Learning, 12,185-203.Reimann, P. (1990). Problem solving models ofscientific discovery learning processes. Frankfurt amMain: Peter Lang.Rowe, A. (1953) The making of a scientist . NewYork:Dodd, Mead.Samarapungavan, A. (1992). Children's judgments intheory-choice tasks: Scientific rationality inchildhood. Cognition , 45 , 1-32.

Schauble, L. (1990). Belief revision in children: Therole of prior knowledge and strategies for generatingevidence. Journal of Experimental Child Psychology ,49 , 31-57.

Schauble, L., & Glaser, R (1990). Scientific thinkingin children and adults. In D. Kuhn (Ed.),Contributions to human development: Vol. 21:Developmental perspectives on teaching and learningthinking skills (pp.9-26). Basel, New York: Karger.

Schunn, C. D. (1995). A Goals/Effort Tradoff Theoryof Experiment Space Search. Unpublished DoctoralDissertation. Department of Psychology, CarnegieMellon UniversitySchunn, C. D., & Anderson, J. R. (in press). Thegenerality/specificity of expertise in scientificreasoning. Cognitive Science .

Schunn, C. D., & Klahr, D. (1995). A 4-space modelof scientific discovery. In J. D. Moore & J. F.Lehman (Eds.) Proceedings of the SeventeenthAnnual Conference of the Cognitive Science Society (pp. 106-111).Schunn, C. D. & Klahr, D. (1996) Integrated YetDifferent: Logical, Empirical, and ImplementationalArguments for a 4-Space Model of InductiveProblem Solving. In G. Cottrell (Ed.) Proceedings ofEighteenth Annual Meetings of Cognitive ScienceSociety. Mahwah, NJ:Erlbaum

Page 29: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 29

Scott, P. D., & Markovitch, S. (1993). Experienceselection and problem choice in an exploratorylearning system. Machine Learning, 12, 49-67.

Shadish, W. R., & Fuller, S. (Eds.). (1994). Socialpsychology of science . New York: Guilford Press.

Shen, W-M. (1993). Discovery as autonomouslearning from the environment. Machine Learning, 12, 143-165.Shrager, J. (1985). Instructionless learning:Discovery of the mental model of a complex device .Doctoral dissertation. Carnegie Mellon University,Pittsburgh, PA.Shrager, J., & Langley, P. (1990). Computationalmodels of scientific discovery and theory formation .San Mateo, CA: Morgan Kaufman.Siegler, R. S., & Liebert, R. M. (1975). Acquisitionof formal scientific reasoning by 10- and 13 year-olds: designing a factorial experiment.Developmental Psychology , 10 , 401-402.

Simon, H. A. (1966). Scientific discovery and thepsychology of problem solving. In R. Colodny (Ed.),Mind and cosmos . Pittsburgh, PA: University ofPittsburgh Press.Simon, H. A. (1973), Does scientific discoveryhave a logic? Philosophy of Science 40 , 471-80.Simon, H. A. (1974), How big is a chunk?Science, , 183 , 482-488.Simon, H. A. (1983) Fitness requirements forscientific theories. British Journal for the Philosophyof Science, 34 :355-365

Simon, H. A. (1985) Quantification of theoreticalterms and the falsifiability of theories. BritishJournal for the Philosophy of Science, 36 :291-298.

Simon, H.A., & Kotovsky, K. (1963) Humanacquisition of concepts for sequential patterns.Psychological Review . 70 (6): 534-546.

Simon, H. A., Langley, P., & Bradshaw, G. L.(1981). Scientific discovery as problem solving.Synthese , 47 , 1-27.

Simon H. A., & Lea, G. (1974). Problem solving andrule induction: A unified view. In L. Gregg (Ed.),Knowledge and cognition (pp. 105-128). Hillsdale,NJ: Lawrence Erlbaum Associates.Sodian, B., Zaitchik, D., & Carey, S. (1991). Youngchildren's differentiation of hypothetical beliefs fromevidence. Child Development , 62 , 753-766.

Sokal, A. (1996). Transgressing the boundaries:Towards a transformative hermeneutics of quantumgravity. Social Text , 46/47 , 217-252.

Terman, L.M. (1954). Scientists and nonscientists ina group of 800 men. Psychological Monographs, 68,Whole No. 378.Thagard, P. (1988). Computational philosophy ofscience . Cambridge, MA: MIT Press.

Thagard, P. (1992). Conceptual revolutions .Princeton, NJ: Princeton University Press.Thagard, P. (1997). Medical analogies: Why andhow. In M. G. Shafto & P. Langley (Eds.)Proceedings of Nineteenth Annual Meetings ofCognitive Science Society. Mahwah, NJ: Erlbaum.(pp. 739-763).Thagard, P. (forthcoming-a). Ulcers and bacteria I:Discovery and acceptance. Studies in History andPhilosophy of Science .

Thagard, P. (forthcoming-b). Ulcers and bacteria II:Instruments, experiments, and social interactions.Studies in History and Philosophy of Science.

Tweney, R. D., Doherty, M. E., & Mynatt, C. R.(Eds.). (1981). On scientific thinking. New York:Columbia University Press.Udea, K. (1997). Actual use of analogies inremarkable scientific discoveries. Proceedings of theNineteenth Annual Conference of the CognitiveScience Society In M. G. Shafto & P. Langley(Eds.) Proceedings of Nineteenth Annual Meetings ofCognitive Science Society. Mahwah, NJ: Erlbaum.(pp. 763-768).Valdés-Perez, R. E. (1994a). Conjecturing hiddenentities via simplicity and conservation laws:Machine discovery in chemistry. ArtificialIntelligence , 65 (2), 247-280.

Valdés-Perez, R. E. (1994b). Algebraic reasoningabout reactions: Discovery of conserved properties inparticle physics. Machine Learning , 17 (1), 47-68.

Valdés-Perez, R. E. (1994c). Human/computerinteractive elucidation of reaction mechanisms:Application to catalyzed hydrogenolysis of ethane.Catalysis Letters , 28 (1): 79-87.

Valdes-Perez, R. E. (1995) Some RecentHuman/Computer Discoveries inScience and What Accounts for Them, AI Magazine ,16(3):37-44, 1995.

Page 30: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 30

van Joolingen, W. R. & de Jong, T. (1997) Anextended dual search space model of scientificdiscovery learning. Instructional Science, 25 , 307-346.Vosniadou, S., & Ortony, A. (1989) Similarity andanalogical reasoning . Cambridge: CambridgeUniversity Press.Wason, P. C. (1960). On the failure to eliminatehypotheses in a conceptual task. Quarterly Journal ofExperimental Psychology , 12 , 129-140.

Wason, P. C. (1968). Reasoning about a rule.Quarterly Journal of Experimental Psychology , 20 ,273-281.Webb, E. J., Campbell, D. T., Schwartz, R. D., &Sechrest, L. (1966) Unobtrusive measures:Nonreactive reserach in the social sciences. Chicago:Rand McNally.Weisberg, R. W. (1993). Creativity: Beyond the mythof genius . New York: W.H. Freeman.

Wellman, H. M. & Gelman, S. (1992) Cognitivedevelopment: Foundational theories in core domains.Annual Review of Psychology , 43, 337-75.

Wertheimer, M. (1945). Productive Thinking . NewYork, NY: Harper & Row.Williams. L. P. (1965). Michael Faraday , NewYork, NY: Basic BooksWolf, D. F II & Beskin, J. R. (1996) Task Domainsin N-Space Models: Giving Explanation Its Due. InG. Cottrell (Ed.) Proceedings of Eighteenth AnnualMeetings of Cognitive Science Society. Mahwah,NJ:Erlbaum

Page 31: STUDIES OF SCIENTIFIC DISCOVERY COMPLEMENTARY … · justification for the study of scientific discovery, a summary of the major approaches, and criteria for comparing and contrasting

Scientific Discovery 31