Cognitive Load Factors.pdf

8/18/2019 Cognitive Load Factors.pdf

1/116


2/116

COGNITIVE LOAD FACTORS IN

INSTRUCTIONAL DESIGN FOR

ADVANCED LEARNERS

No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or

by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no

expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No

liability is assumed for incidental or consequential damages in connection with or arising out of information

contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in

rendering legal, medical or any other professional services.


3/116


4/116

COGNITIVE LOAD FACTORS ININSTRUCTIONAL DESIGN FOR

ADVANCED LEARNERS

SLAVA KALYUGA

Nova Science Publishers, Inc. New York


5/116

Copyright © 2009 by Nova Science Publishers, Inc.

All rights reserved. No part of this book may be reproduced, stored in a retrieval system

or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape,

mechanical photocopying, recording or otherwise without the written permission of the

Publisher.

For permission to use material from this book please contact us:

Telephone 631-231-7269; Fax 631-231-8175

Web Site: http://www.novapublishers.com

NOTICE TO THE READER

The Publisher has taken reasonable care in the preparation of this book, but makes no

expressed or implied warranty of any kind and assumes no responsibility for any errors or

omissions. No liability is assumed for incidental or consequential damages in connection

with or arising out of information contained in this book. The Publisher shall not be liable

for any special, consequential, or exemplary damages resulting, in whole or in part, from

the readers’ use of, or reliance upon, this material.

Independent verification should be sought for any data, advice or recommendations

contained in this book. In addition, no responsibility is assumed by the publisher for any

injury and/or damage to persons or property arising from any methods, products,

instructions, ideas or otherwise contained in this publication.

This publication is designed to provide accurate and authoritative information with regard

to the subject matter covered herein. It is sold with the clear understanding that the

Publisher is not engaged in rendering legal or any other professional services. If legal or

any other expert assistance is required, the services of a competent person should be

sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY ACOMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF

PUBLISHERS.

LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA

ISBN: 978-1-60741-685-2 (E-Book)

Available upon request

Published by Nova Science Publishers, Inc. New York


6/116

CONTENTS

Preface vii

Chapter 1 Basic Architecture of Human Cognition 1

Chapter 2 Cognitive Studies of Expert-Novice Differences

and Design of Instruction 21

Chapter 3 Cognitive Load Perspective in Instructional Design 35

Chapter 4 Cognitive Load Principles in Instructional Design

for Advanced Learners 69

Summary Toward a Cognitively Efficient Instructional

Technology for Advanced Learners 91

Index 99


7/116


8/116

PREFACE

The empirical evidence described in this book indicates that instructional

designs and procedures that are cognitively optimal for less knowledgeable

learners may not be optimal for more advanced learners. Instructional designers or

instructors need to evaluate accurately the learner levels of expertise to design or

select optimal instructional procedures and formats. Frequently, learners need to

be assessed in real time during an instructional session in order to adjust the

design of further instruction appropriately. Traditional testing procedures may not

be suitable for this purpose. The following chapters describe a cognitive load

approach to the development of rapid schema-based tests of learner expertise. The

proposed methods of cognitive diagnosis will be based on contemporary

knowledge of human cognitive architecture and will be further used as means of

optimizing cognitive load in learner-tailored computer-based learning

environments.


9/116


10/116

Chapter 1

BASIC ARCHITECTURE OF HUMAN

COGNITION

A cognitive approach to human learning emphasizes the internal cognitive

mechanisms of learning. Such mechanisms are usually described as

transformations performed on various mental representations of situations and

tasks. An important assumption of the approach is that a single general cognitivesystem underlies human cognition. Different theoretical approaches specify this

general cognitive system as corresponding cognitive architectures. The

understanding of human cognition within a cognitive architecture requires

knowledge of corresponding models of memory organization, forms of knowledge

representation, mechanisms of problem solving, and the nature of human

expertise.

MEMORY ORGANIZATION

The major characteristics of human memory are its strength or durability,

capacity (number of items of information stored in memory), and speed of access.

According to these characteristics, memory is divided into long-term memory and

short-term memory. Long-term memory (LTM) is characterized by high strength

and includes well-learned knowledge, for example, the name of the first US

President, 5 x 5 = 25, or the spelling of the word potatoes. It is presumed to have

unlimited capacity, although the access to the stored information could be slow.

Both the strength of memory and the speed of access increase with practice. More


11/116

Slava Kalyuga2

fully elaborated and more deeply processed material results in better long-term

memory.

Short-term memory (STM), on the other hand, includes information that has

been just encoded from sensory registers or retrieved from long- term memory,

for example, what have you been thinking about just before this? what are you

thinking about when dialing the phone number 8344 2124?. The durability of

STM is a matter of seconds (Peterson & Peterson, 1959), and information in STM

could be accessed very rapidly. The number of items of information that can be

maintained in an active state simultaneously in STM is about seven units for most

people (Miller, 1956). For example, it is very difficult for us to recall more than

approximately seven serially presented random numbers (e.g., an unfamiliar

phone number) a few seconds after we hear or see them, unless the numbers have

been intentionally rehearsed. When asked to copy strings of digits from one page

to another, we usually do this by grouping the digits by easily manageable units of

three or four at a time.

The most generally specified basic human cognitive architecture includes

these two substructures (STM and LTM). Examples are the standard model

(Newell & Simon, 1972) and modal model (Atkinson & Shiffrin, 1968; Waugh &

Norman, 1965). In more specific models, these substructures might be regardedeither as a single memory store with different modes of activation for long-term

and short-term components, or as separate memory stores. These distinctions are

not essential when considering the basic level of cognitive architecture. However,

in order to explain human cognition, this general model needs to be supplemented

by some attention control mechanism (central processor or central executive)

which determines what information from sensory stores or LTM is brought into

STM. The information that is actually attended to is limited to a small number of

chunks in STM (Simon, 1979; Ericsson & Simon, 1993a, 1993b).

Various cognitive architectures and elaborations of the general model extendthe described memory structure. For example, the concept of working memory

(WM) was introduced to account for processing of units of information that are

interconnected, rather than random, and should be processed concurrently because

of the nature of things they reflect or due to established associations in long-term

memory. Working memory is considered as "a system for the temporary holding

and manipulation of information during the performance of a range of cognitive

tasks" (Baddeley, 1986, p. 34), a “desktop of the brain … that keeps track of what

we are doing or where we are moment to moment, that holds information long

enough to make a decision, to dial a telephone number, or to repeat a strangeforeign word that we have just heard” (Logie, 1999, p.174). Some simple

examples of working memory operation could be provided by the following tasks:


12/116

Basic Architecture of Human Cognition 3

close your eyes and pick up a pen in front of you; count the number of windows in

your house or apartment; mentally rearrange the furniture in your room, ormentally complete a mathematical operation (for more examples, see Logie,

1999).

After incoming stimuli from an external source are registered in sensory

memory, perceived or matched to recognizable patterns by using prior knowledge

(if any) in LTM and context, and are paid attention to, they are transferred into

WM. If a unit of information is not recognized due to the lack of appropriate LTM

patterns, it still could be attended to and processed in WM, with appropriate

cognitive resources allocated for the task. Attended units of information in WM

are assigned meaning and used for constructing integrated mental representations

of a situation or task (Figure 1). This information, however, may fade very

quickly if attention is diverted or if the capacity of WM is overloaded.

Baddeley and Hitch (1974) first proposed that WM performed both

processing and storage functions. They suggested three structural components of

working memory: a central executive and two separate auditory and visual stores

for handling verbal information and visual images. These two stores serve as

maintenance systems controlled by the central executive and are called

respectively an articulatory or phonological loop (‘inner voice’) and a visuospatialsketchpad (‘inner eye’). The limited capacity of the central executive is used for

processing incoming information, with the remainder used for the storage of

intermediate and final products of that processing. Storage and processing

capabilities of WM trade off against each other. When memory load increases

above some threshold, our performance could be inhibited. To get a feeling of

WM limitations, try to mentally add two large numbers (for example, 83 468 437

and 93 849 040). For a concurrent task, you may try also to attend simultaneously

to a comedy show on your TV. It would be very difficult to do because each of

these activities alone may take all of your WM resources.There are three major functional aspects of working memory operation:

temporary storage, manipulation of information, and executive control.

Temporary storage of information was the focus of classic models of STM and

was studied using standard word or digit STM span tasks. These were simple

tasks involving recalling a list of digits or unrelated words and not requiring much

prior knowledge. Active manipulation of information has been the focus of

models of WM and has been studied using WM span tests that require concurrent

processing of several tasks. These are relatively more complex tasks involving

meaningful cognitive operations such as reading sentences or performingnumerical transformations, and then recalling the final words of those sentences or

results of the math operations. Performance of complex cognitive tasks requires


13/116

Slava Kalyuga4

simultaneous use and integration of various sources of information, coordination

of separate processes and representations. It is the executive functioning of WM,interactions between WM and LTM knowledge structures that have become the

focus of research in recent years (see Miyake & Shah, 1999, for a recent overview

of WM models and the state of the field).

A number of hypotheses have been proposed to explain individual differences

in WM capacity and its relation to performance. These theories considered

differences in total WM capacity, differences in processing efficiency of WM, or

both. According to the total capacity approach (Baddeley & Hitch, 1974; Cantor

& Engle, 1993; Case, 1985; Engle, Cantor, & Carullo, 1992), all cognitive

processes require resources from a fixed pool. Any resources not allocated to the

operations can be used for short-term storage. The storage and processing

capabilities of working memory trade off against each other. When memory load

increases above some threshold, a person’s performance may decline. A change in

total capacity caused, for example, by fatigue or age should affect the

performance in a wide range of tasks.

Constructing mentalrepresentations of asituation or task

Long-TermMemory

Knowledge base

Working Memory

Sensory Memory: Incoming information

Figure 1. Basic architecture of human cognition.


14/116


The task-specific hypothesis (Daneman & Carpenter, 1980) assumed that

WM capacity is specific to the particular task being performed. Efficient processing skills leave more WM capacity for storage of processing products. A

change in processing efficiency should be specific to a particular task and result

from intensive practice or training (Just & Carpenter, 1992). Performance would

be influenced only if available resources are in short supply when a person

operates at the limit of WM capacity. The processing efficiency approach assumes

that a single central system is responsible for the processing and temporary

storage of information. Its limited capacity must be shared between the processing

and the storage demands. Individuals with inefficient processes have a

functionally smaller storage capacity because they must allocate more resources to

the processes (Daneman & Carpenter, 1983; Daneman & Tardif, 1987).

Working memory capacity was measured in terms of operational capacity

dependent on the type of specific background task used in a particular domain

(Carpenter & Just, 1989). For example, the reading span test was used to measure

WM capacity as the largest size of the set of simple sentences from which a

subject can reliably recall the final words of all the sentences (Daneman &

Carpenter, 1983). Daneman and Tardif (1987) established that the reading span

was a measure specific to the language skills, not a measure of general workingmemory capacity, and it correlated significantly with reading comprehension

ability.

Although there obviously are systematic differences among individuals in

their working memory capacity for specific tasks, and these differences influence

performance when the person operates at the limit of his or her working memory

capacity, no single approach or hypothesis concerning the interpretation of

individual differences in WM capacity has received convincing empirical support.

Such differences could be strongly influenced by knowledge structures available

in long-term memory. Any WM span implicitly reflects an individual's knowledgeand experience in a domain, and this knowledge inevitably influences his or her

performance in both processing and storage parts of the task (e.g., Hulme,

Maughan, & Brown, 1991; Hulme, Roodenrys, Brown, & Mercer, 1995). WM

span measures thus could be used as predictors of the person’s performance in the

corresponding domain rather than measures of his or her true general WM

capacity. It is practically impossible to eliminate the influence of the person’s

knowledge base when meaningful tasks are involved in WM span tests. From this

point of view, approaches that focus on connections between the content and

operation of working memory and long-term memory could be more relevant and productive.


15/116

Slava Kalyuga6

Simple chunking mechanisms provide an example of using long-term

memory structures in transforming the content of working memory. The chunk isa familiar unit of information based upon previous learning. For example, it could

be difficult to remember and recall a string of random letters like

B,B,C,C,I,A,A,B,C,F,B,I, unless we chunk them together into BBC, CIA, ABC,

FBI. In this case, we use our prior knowledge stored in LTM to reduce the number

of elements to a manageable four chunks. The same method could be used with

the following string of numbers: 1,9,1,4,1,9,4,5,1,9,9,6,2,0,0,1. Another common

example of chunking in language comprehension is the way we chunk letters into

familiar words, and words into familiar phrases. An STM capacity estimate of

around seven units (Miller, 1956) actually indicates the number of chunks rather

than total amount of information stored in STM. This mechanism explains how

we manage to get around the information-processing bottleneck created by our

limited working memory capacity, and to learn the enormous amount of

knowledge in our LTM.

People can be trained to effectively increase their memory capacity to an

amazing degree through extensive training in chunking and re-chunking

information into meaningful units using their prior knowledge stored in LTM. The

skilled memory theory (Chase & Ericsson, 1982) claims that people developmechanisms that enable them to use a large and familiar knowledge base to

rapidly encode, store, and retrieve information within the area of their expertise

and thus circumvent the working memory capacity limitations. As a result, experts

possess an enhanced functional working memory capacity in domains of their

expertise (Ericsson & Staszewski, 1989).

Available domain-specific knowledge enables experts to quickly encode and

retain large amounts of information in LTM. Such LTM storage and retrieval

operations speed up with practice and are comparable with STM encoding and

retrieval, resulting in experts' superior task performance and superior recall forfamiliar materials (the skilled memory effect; Ericsson & Staszewski, 1989). For

example, expert mnemonists can increase their digit spans far beyond the limit of

Miller's seven plus-or-minus two digits. They use familiar chunks of knowledge

in LTM to encode new information in an easily accessible form. Ericsson and

Staszewski (1989) described a person who expanded his digit span to 84 digits by

grouping them into short sequences and encoding them in terms of, familiar to

him, athletic running times, dates, and ages. He nevertheless operated under the

constraints of limited-capacity STM: the size of digit groups never exceeded five

digits, and these groups never were clustered in supergroups with more than fourgroups in a supergroup.


16/116


In the WM model of Carpenter and Just (1989), the operation of WM during

reading comprehension is also based on relations between WM and LTM. In thismodel, WM consists of currently active pointers to LTM structures and partial or

final products of processing. A reader stores the theme of the text, the general

representation of the situation, the major propositions from preceding sentences,

as well as a representation of the sentence he or she is currently reading (Just &

Carpenter, 1992). When dealing with an unstructured series of words, we can

usually recall only six or seven unrelated words in order (according to our STM

span). Skillful readers, on the other hand, can recall and understand long

sentences (about 77% of words in up to 22-word sentences) because they use

internal structures in LTM to circumvent WM limitations. Thus, sentence

comprehension can be considered as recoding (chunking) incoming symbols into

some structure (Carpenter & Just, 1989).

Ericsson and Kintsch (1995) further developed these ideas into the theory of

long-term working memory (LT-WM). In this theory, LTM knowledge structures

associated with components of working memory form a LT-WM structure that is

capable of holding virtually unlimited amount of information. Some additional

mechanisms were introduced for overcoming the effects of interference in experts'

use of LTM knowledge for storage and retrieval of newly encoded informationwere introduced. The proposed mechanism of LT-WM operation involves cue-

based retrieval of information from LTM. The encoding method can be based on a

specifically constructed retrieval structure, an elaborated existing memory

structure, or a combination of the two. Skilled performance depends on domain-

specific knowledge structures relevant to particular tasks, and, consequently, there

are individual differences in the operation of LT-WM for a given task (Ericsson &

Kintsch, 1995).

KNOWLEDGEREPRESENTATIONS

Our knowledge base in LTM profoundly influences cognitive processes in

most situations. Therefore, forms of knowledge representations are critical for

understanding human cognition. Several major ways of representing the meaning

of information in memory have been suggested: propositional representations

(semantic networks), procedural representations (production systems), and

schemas. Analogical representations or mental models (Rumelhart & Norman,

1983) can be generally considered as schemas. The concept of a proposition denotes the primitive unit of meaning, or a smallest unit of knowledge about

which it is possible to make the judgment, true or false. Networks of such


17/116

Slava Kalyuga8

interconnected units can be used to represent the meaning of sentences and

pictures.

Newell and Simon (1972) suggested that knowledge could be represented by

a set of conditional rules or productions condition→ action. The production rules

are stored in long-term memory and are retrieved and used in working memory.

The current contents of working memory are matched against the conditions of all

the production rules in long-term memory. Whenever the conditions of a rule

occur in working memory, the rule is triggered and its action is carried out. Action

of the rule can change the contents of working memory and determine which rule

is triggered next. Thus, the principles determining how one rule is followed by

another are built into the rules themselves.

One of the most advanced theories based on the idea of production rules, the

ACT* theory (Adaptive Control of Thought; Anderson, 1983), or its updated

version ACT-R (R for rational; Anderson, 1993), suggest a separate type of long-

term memory for production rules (for skills) in addition to the declarative

memory (propositions, images, and other representations for facts and

experiences). The items in these memories can vary in their degree of ‘activity’. If

the contents of working memory match more than one rule in procedural memory

then whichever is the most active is triggered.The concept of a schema, originally discussed by Bartlett (1932), came into

cognitive psychology from research in artificial intelligence (Minsky, 1975;

Bobrow & Winograd, 1977). Schemas generally represent the object as a set of

attributes (slots). Schemas abstract generalizations about objects from specific

instances, encode general categories and typical features. They may include not

only propositions, but also perceptual features (for example, spatial images) and

stereotypic sequences of events. Schemas may have slots with fixed or variable

values; slots with variable values usually have some default or most probable

values.The most important features of schemas are stable patterns of relationships

between variables (slots). Each schema contains information about some class of

structures. When particular values are assigned to slots of a schema, a schema-

based knowledge structure could be obtained in the form of concepts,

propositions, etc. The obtained knowledge structures could be more general or

more specific depending on those values. Multiple schemas can be linked together

and organized into sophisticated hierarchical structures where one schema can

form part of a more complex schema.

Schemas may represent knowledge of all kinds and levels: from individualletters (allowing us to recognize different variations of handwritten letters) to

complex electronic or organizational systems, behavioral patterns, visual and


18/116


auditory perceptual images. For example, our schema for a human face includes

slots for eyes, a nose, a mouth, ears, etc. These components are arranged in acertain configuration that is not a rigid one. However, some general requirements

should be met: the nose and eyes should be located above the mouth; eyes should

be located above the nose on different sides of it, etc. This general schema allows

us to recognize instances of human faces in limitless situations, including some

peculiar forms of visual arts.

A student’s schema for solving linear algebraic equations of the type ax = b

may include three slots: 1) a number b on the right hand side of the equation; 2) a

number a on the left hand side of the equation; and 3) the division operation:

divide the content of the first slot on the content of the second slot. For less

experienced students, the schema may include the operation of dividing both sides

of the equation on the same number a. In this case, the schema would contain

slots for both parts of the equation, the dividing number a, and the division

operation.

For an example of higher-level schematic knowledge representations,

consider the technical domain that includes knowledge about various technical

objects (e.g., tools, devices, machines, technological procedures). This variety of

knowledge in any technical area could be represented with different levels ofspecification: from descriptions of general features to specific details. A

schematic framework for representing knowledge about a technical object may

include three main interconnected components that could be referred to as

functional, operational, and structural descriptions. Any technical object could be

characterized by some functions or purpose it was designed for (what is this

object for?), processes utilized in the object’s operation (how does it operate?),

and the object’s internal structure including links between its components (what

does it consist of?). To explain an object’s operation means to explain why a

given set of linked parts performs specific functions utilizing certain processesduring operation. A learner should establish connections between functional,

operational, and structural components of the object’s description in order to

understand how it works (Kalyuga, 1984; 1990).

Gruber and Russell (1996) suggested similar classes of an artifact description:

structure (the physical and/or logical composition of an artifact in terms of the

composition of parts and connection topologies), behavior (something an artifact

might do in terms of observable states or changes), function (effect or goal to

achieve by artifact behavior), requirements (prescriptions concerning the

structure, behavior, and/or function that the artifact must satisfy), and objectives(specifications of desired properties of the artifact other than pure functions, such


19/116

Slava Kalyuga10

as cost and reliability). Requirements and objectives could be generally included

into the functional description (as functional requirements and general functions).

functions of the object

alternativecombinations ofprocesses realizinga set of functions

alternative technicalsolutions realizing acombination ofprocesses

Figure 2. General schematic structure of technical knowledge.

Each of above aspects of technical knowledge may have different levels of

generalization. It is possible to describe an object in very general terms (a global

level or general overview) or in more details with different levels of specification.

When combined together, all aspects, components, and levels of the description of

a technical object create a sophisticated multilevel hierarchical schematicstructure of technical knowledge. In an abstract form, this structure could be

represented by the graph in Figure 2. Three levels of description are shown for


20/116


functions, processes, and structural components of a technical object. Simple and

superficial knowledge about the object may include only isolated componentscorresponding to the upper rows in the depicted clusters of knowledge elements.

Further deepening of knowledge requires establishing relations between these

components and adding elaborated knowledge on more specific levels of

description.

There are many definitions of schemas depending on the theoretical

perspective of the researcher. It is practically impossible to precisely describe the

schematic knowledge structures held by an individual. As Norman (1983) noted,

"we must … discard our hopes of finding neat, elegant mental models, but instead

learn to understand the messy, sloppy, incomplete, and indistinct structures that

people actually have" (p. 14). In general, a schema can be described functionally

as a cognitive construct (an organized knowledge structure) that allows people to

classify information according to the manner in which it will be used (e.g., Chi,

Glaser, & Rees, 1982; Sweller, 1993). Such organized knowledge structures

represent a major mechanism for extracting meaning from new information,

acquiring and storing knowledge, circumventing the limitations of working

memory, increasing the strength of memory, and recalling information. They

impose an organization on the information, guide retrieval, and provideconnections to prior knowledge.

In schema theory, the process of learning can be considered as encoding new

information in terms of existing schemas, as schema modification, or as the

creation of new schemas. The creation or modification of a schema is based on

conscious cognitive processing of information in working memory. In a more

general context, schema acquisition could be regarded as an example of a non-

linear process where the schema emerges from lower-level components during

learning or practice. As a cognitive unit, the schema represents a higher level of

organization than just a simple collection of lower-level components.The need for the emergence of higher levels of schema hierarchy could be

associated with general limitations of human information processing. In a wider

context, any qualitatively new level of a system emerges in a non-linear way as a

means to overcome the combinatorial barrier caused by immense number of

possible combinations of the variety of elements of the previous, lower level.

Examples of such processes are the emergence of the molecular level from atoms,

biochemical structures from molecules, or nerve impulses from biochemical

structures (Scott, 1995; Turchin, 1977). Structured neuronal groups might

represent the qualitatively new biological level of conscious cognitive functioning(Edelman, 1992). On the psychological level of description, our abstract high-

level schematic knowledge representations in long-term memory (and


21/116

Slava Kalyuga12

corresponding intellectual abilities associated with operating such structures)

might have emerged as a means of overcoming the combinatorial barrier underconditions of limited processing capacity.

Because a schema is treated as a single unit in working memory, such high-

level structures require less working memory capacity for processing than the

multiple, lower-level elements they contain, making the working memory load

more manageable. Our abilities to construct and use higher-order hierarchical

cognitive configurations of knowledge structures in long-term memory might

have emerged during evolution as a way of providing structure to the elements

being dealt with by working memory (Sweller, 2003, 2004). Thus, by allowing

multiple elements to be treated as a single element in working memory, long-term

memory schematic structures may have, as one of their functions, the reduction of

working memory load.

Specific schema selection in a particular situation is usually automated and

quick. Our first impression about an unfamiliar person (which is said to be the

most important), our comprehension of movies, fiction, music, humor, or art is

guided by our acquired domain-specific schematic knowledge structures. Schemas

guide our recall of different past events. Our memory usually retains the gist of a

situation or event according to our schematic knowledge of it. The schema defineswhat is encoded and stored. When recalling the event, we create schema

instantiations filling in missing information and inferring unavailable components

using our schemas for the event. Sometimes such recall may produce various

distortions to fit our schemas or expectations (e.g., recall scenes of court

procedures from movies and fiction stories with witnesses remembering details

they have not actually witnessed).

The structure of the schematic knowledge can be empirically assessed, for

example, by asking students to group problems into clusters on the basis of

similarity; to categorize problems after hearing only part of the text; to provideanswers to problems when content words have been replaced by nonsense words;

to solve problems when material in the text is ambiguous; to contrast problems

using a nominated principle; to recall problems that were presented earlier; to

identify which information within problems is necessary and sufficient for

solution; and to classify problems in terms of whether the text of each problem

provides sufficient, missing or irrelevant information for solution (‘text editing’)

(Low & Over, 1992).

Previously acquired schematic knowledge structures are the most important

factor that influences learning new material. A student’s understanding of aninstruction means instantiation of appropriate familiar schemas that would allow

her or him to assimilate new information with prior knowledge. A failure to


22/116


comprehend instruction might be caused by the lack of any appropriate schemas

in LTM, by the lack of sufficient cues in the situation to elicit a schema, or by thelearner applying a different schema than that intended by the instruction.

Students' preexisting schemas often resist change: everything that cannot be

understood within the available schematic frameworks is ignored or learned by

rote. It is important to build new knowledge on top of students existing schemas

or help them to acquire an appropriate schematic framework by relating it to

something already known. Useful instructional techniques could be analogies or

diagrams, to establish links with existing knowledge, and advance-organizers to

elicit or activate existing relevant schemas or provide new ones (concept maps,

headings, summaries at the start of chapters, etc.).

Similar to production systems, a schema-based approach to representing

knowledge provides a general framework that can be instantiated by specific

theories. In all schema-based models of cognitive architecture, schemas are

matched to the contents of working memory for recognition. If a schema is

partially matched by the information in working memory, it will create further

information to complete the match. Schemas instantiated in working memory

could be modified or reorganized, then placed back into long-term memory and

serve as a new, more specific schema for further recognition.Schema theories do not differentiate between procedural and declarative

knowledge. Instructions for actions may be produced by matching a schema to a

situation and adding missing pieces of information. For example, recognizing a

situation as a schema for solving simple linear algebraic equation and recognizing

values of corresponding slots would provides directions for necessary operations.

Production rules could be considered as a form of schematic knowledge. There is

a tendency towards converging production system and schema-based approaches

within those approaches. For example, Koedinger and Anderson (1990) integrated

two approaches by constructing a computational (production-system-style) modelof solving geometry problems using schema-based knowledge structures. The

schemas (‘diagram configuration schemas’) were described as clusters of

geometry facts that were associated with a single prototypical geometric image.

In this book, schematic knowledge structures will be used as the basic unit

and prevailing form of knowledge representations in long-term memory.

Accordingly, the approach to human performance that is based on studies of

schematic knowledge structures will be further referred to as a schema approach.


23/116

Slava Kalyuga14

PROBLEM SOLVING AND THE NATURE

OF HUMAN EXPERTISE

All of our purposeful cognitive activities can be considered as problem

solving. Initially, in the 1950s and 1960s, most research studies on problem

solving were concerned with knowledge-lean task domains that required no

special training or background knowledge (for example, the famous ‘Tower of

Hanoi’ task, various puzzles, etc.). The study of such tasks led to the formulation

of a general theory of human problem solving (Newell & Simon, 1972). In this

theory, a problem contains three main components: a given state, a goal state, anda set of operators for transforming the given state into the goal state. Problem-

solving activity is considered as a search in the problem space that consists of

separate problem states (knowledge states). The task of problem solving is to find

a sequence of operators that can transform the initial state into a goal state within

the problem space.

So-called weak methods could be used in solving knowledge-lean tasks. We

often use general heuristics (rules of thumb) for choosing necessary sequences of

operators. For example, the difference reduction heuristic suggests choosing

operators that maximally reduce the difference between the current state and thedesired state. However, this method does not guarantee success in solving the

problem, and more advanced methods are usually adopted. Forward chaining

starts with the initial problem state, and a selected heuristics-based operator is

applied, and then the strategy repeats. Backward chaining starts with the desired

solution state, and a heuristically chosen operator is applied in reverse. A

subgoaling strategy chooses an operator and forms a subgoal to find a way to

change the current state so that the chosen operator could be applied. The method

of solving by analogy uses the structure of the solution to one problem to obtain

the solution to another problem (van Lehn, 1989).

The weak methods are often used in combined forms. For example, the GPS

(General Problem Solver) production system-based mechanism developed by

Newell and Simon (1972) uses the means-ends analysis method. This method

consists of looking for an operation that reduces the difference between the goal

and initial state, setting up subgoals whose solution provides a solution of the

original goal, and building up a hierarchical plan to solve a problem. Means-ends

analysis thus combines forward chaining and operator subgoaling: the current

state of problem solving is compared to the goal state and actions are selected to

reduce the difference (van Lehn, 1989).


24/116


In the early 1980s, experiments with puzzle problems demonstrated that, even

after extensive problem solving by means-ends analysis, participants still did notinduce a simple solution rule. Rule induction occurred only after some additional

information had been provided (Mawer & Sweller, 1982; Sweller & Levine, 1982;

Sweller, Mawer, & Howe, 1982). Empirical evidence was obtained that extensive

practice in conventional problem solving was not an effective way of acquiring

schemas that are required to successfully solve corresponding problems (Owen &

Sweller, 1985; Sweller & Cooper, 1985; Sweller & Levine, 1982; Sweller,

Mawer, & Ward, 1983). These studies suggested that a means-ends strategy could

inhibit schema acquisition.

A means-ends strategy focuses attention on specific features of the problem

situation required to reach the goal and on reducing difference between current

and goal problem states by selecting proper operators. Maintaining subgoals and

considering alternative solution pathways are cognitively demanding mental

activities that might result in working memory overload. Additionally, these

activities are unrelated to learning solution schemas that are critical for successful

future problem solving. They reduce resources devoted to learning other

important aspects of problem structure. For example, studies of two-step problems

demonstrated that cognitive load might be very high at the subgoal stagesresulting in more errors than on the final goal stage (Ayres & Sweller, 1990).

Sweller & Levine (1982) demonstrated rapid learning of maze problem-

solving schemas when the specific goal state was unknown, and it was not

possible to reduce differences between the goal and given problem states. Sweller,

Mawer, and Ward (1983) found that using a means-ends strategy can actually

impair learning, and that less directed exploration of the problems facilitated

acquisition of useful problem schemas. They used simple physical and geometry

problems without a specific goal stated (goal-free problems such as Calculate the

value of as many variables as you can) and observed enhanced development of problem-solving skills. Owen and Sweller (1985) found that problem solvers

using a means-ends strategy made significantly more errors than those using other

methods, supposedly due to the working memory load associated with means-

ends analysis.

In a theoretical investigation of the cognitive (working memory) load

phenomena, Sweller (1988) constructed and analyzed a computational model of

cognitive processes based on a theory of production systems (Newell & Simon,

1972). The model operates by matching elements on the condition side of each

production to elements in a working memory (for example, the knowns,unknowns, goal, possible equations or theorems). If the condition side of a

production is matched by some of the elements in working memory, the


25/116

Slava Kalyuga16

production can fire, and its action alters the content of working memory allowing

other productions to fire. The cognitive load in such a model could be measuredconsidering the number of statements in working memory, the number of

productions, the number of cycles to solution, and the total number of conditions

matched. Application of this model to novice cognitive behavior in various

instructional procedures provided evidence of the heavy cognitive load associated

with a means-ends strategy compared with a forward-working goal-free strategy.

It also explained why the use of goal-free problems or worked examples was more

effective means of acquiring schemas than conventional problem solving

(Sweller, 1988; Ayres & Sweller, 1990).

Since the late 1970s, the research focus in problem solving shifted to studying

knowledge-rich task domains (algebra, geometry, physics, thermodynamics,

computer programming, chess, bridge, etc.) that required an essential knowledge

base as a prerequisite. Problem solving in such domains has additional

complexities. Representation of a problem requires a great deal of domain

knowledge, and operators that are usually used are domain-specific operators. The

central questions of research in such domains are how is knowledge used to build

up a problem representation and how does it influence the actual problem-solving

process (Reimann & Chi, 1989).In semantically rich domains, problem solving involves searching one's

knowledge of the domain in order to find the operators for solving the problem.

Research on the use of knowledge in problem solving suggests that people use

two types of domain-specific knowledge to solve problems: declarative

conceptual knowledge (knowledge of the principles of the domain) and procedural

knowledge (knowledge how to perform cognitive activities). Procedural

knowledge may be described as a set of production rules that define actions for

achieving goals (Anderson, 1983). Conceptual and procedural knowledge in

problem solving can be considered as organized into problem schemas. They formthe general framework of knowledge that corresponds to classes of problems.

Problem solving in complex domains thus can be viewed as finding an

appropriate problem schema in long-term memory and filling in this schema with

the specific parameters of the problem (Chi, Feltovich, & Glaser 1981; Chi &

Glaser, 1985). The problem schema determines what conceptual knowledge is

used to build a representation of the problem statement, and what procedures are

used to solve the problem. Much research in knowledge-rich domains is

concerned with the differences between expert and novice problem solving. It has

become evident that experts' behavior is mostly determined by their knowledge base. Therefore, the learning processes in which the experts acquired this

knowledge are critical in explaining their performance. The focus of attention in


26/116


the later studies shifted to learning theories as theories of the acquisition of

expertise (Van Lehn, 1989).A considerable number of recent research studies in cognitive psychology

have been concerned with the investigation of the structures and processes of

human competent performance as a consequence of learning. It is generally

accepted that development of expert performance is a very complex process

involving a great deal of deliberate effort. Studies have shown that at least 10

years of practice are necessary for people in various fields of culture and science

to reach superior levels of skilled performance (Ericsson & Charness, 1994;

Ericsson, Krampe, & Tesch-Romer, 1993; Simon & Chase, 1973).

Expert performance is usually acquired during extensive deliberate practice in

a domain. Such practice should be organized at an appropriate and challenging

level of difficulty, allow steady skill refinement by repetition and error correction,

and provide informative feedback to the learner (Ericsson et al., 1993; Ericsson &

Lehman, 1996). Competent expert performance generally requires well-developed

cognitive skills, well-organized structures of knowledge, and self-regulatory

performance control or metacognitive strategies (Glaser, 1990).

Well developed cognitive skills as a major characteristic of expert

performance require functional (related to conditions of applicability) automatedknowledge (Fitts & Posner, 1967; Anderson, 1983, 1993; Klahr, Langley, &

Neches, 1987). The process of skill learning is claimed to occur in several stages.

In the first stage (cognitive stage), a description of the procedure is learned in the

form of declarative knowledge. In the second stage (an associative stage), the

declarative information is transformed into a procedural form, and a set of

procedures for performing the skill is acquired. Such a process of converting

declarative knowledge into a procedural form is called proceduralization. In this

stage, two forms of knowledge (declarative and procedural) coexist. In the third

stage (autonomous stage), the skill becomes more rapid and automatic (Anderson,1983).

When knowledge becomes automated during the development of proficiency,

conscious processing capacity can be concentrated on higher levels of cognition.

Automated performance requires a limited attentional capacity. Processing that

once demanded active control, after extensive practice can become automatic,

freeing limited attentional capacity for other tasks (Kotovsky, Hayes, & Simon,

1985; Schneider& Shiffrin, 1977; Shiffrin & Schneider, 1977). For example,

while the use of declarative knowledge initially requires much conscious

cognitive processing, automatic application of proceduralized knowledge freesworking memory and allows its capacity to be used for the processing of new

knowledge. Intensive training on certain procedural elements of a task can make


27/116

Slava Kalyuga18

them more automatic and free cognitive capacity for other more creative elements.

This is especially important for transfer of training (Cooper & Sweller, 1987;Howell & Cooke, 1989). Automated lower level routine procedures enable

learners to concentrate on finding new ways of applying their knowledge in

unfamiliar situations.

The process of learning could be considered as the acquisition of new

schemas that eliminate the need to apply weak problem-solving methods (e.g.,

means-ends analysis) to solve future similar problems. The result is a shift from a

novice strategy of working backward from the goal using means-ends analysis

and subgoaling, to a more expert knowledge-based strategy of working forward

from the initial state to the goal. Availability of a sufficient set of relevant

domain-specific schematic knowledge structures that could be used in performing

tasks is an important feature of a competent human performance. With experience

in a domain, knowledge is organized into larger interconnected aggregate

structures that explain the skilled performance of experts (Chi, Glaser, & Farr,

1988; Lord & Maher, 1991).

Under a schema-based approach, learning can take different forms. Schema

evolution is a central mechanism in the development of expertise. New

information could be encoded in terms of existing schemas without involving anynew schemas. Schemas evolve as they are applied and utilized as learner

experience in the domain increases. Another form of learning is restructuring or

creation of new schemas. In order to explain how schemas can be built up through

experience, Rumelhart and Norman (1981) proposed a mechanism of learning by

analogy. Initially, a new schema could be created by modeling it on an existing

schema followed by a process of refinement (tuning). When a learner encounters a

new situation in a familiar domain, she or he tries to interpret it using existing

schemas. If none of them suits the situation, the best existing schema can serve as

a model from which to start the tuning process. The characteristics of this modelthat do not contradict the new situation are carried over into the new schema.

Planning and self-regulatory (metacognitive) skills allow experts to control

their performance, assess their work, and predict its results. These self-regulatory

skills are an important condition of expert ability to use the available knowledge

base (Chi, Bassok, Lewis, Reimann, & Glaser, 1989; Larkin, McDermott, Simon,

& Simon, 1980). Chi et al. (1989) proposed that students learn and understand

examples of problem solutions via the self-explanations they give while studying.

Students who are successful problem-solvers tend to study example exercises by

explaining and providing justifications for each action and relating these actionsto the principles and concepts of the domain. These students read the example

with understanding and self-monitoring. Students who are less successful


28/116


problem-solvers do not connect their explanations (if any) with their

understanding of the principles of the domain. During problem solving, successfulstudents may use examples for a specific reference, whereas less successful

students repeat them in search for ready-made solutions. The level of performance

significantly depends on the metacognitive skills that learners bring to the task.

Cognitive studies of human performance and learning have the potential to

greatly influence instructional design principles. Generally, instructional design

should minimize learners' involvement in activities that overburden their limited

working memory and be adapted to the learners’ available knowledge structures

in long-term memory. Appropriate design of instruction should be based on the

knowledge of characteristics of expert performance, expert-novice differences,

and the transition process from novice to expert. Cognitive models of expert

performance and their influence on the design of instruction are considered in the

following chapter.


29/116


30/116

Chapter 2

COGNITIVE STUDIES OF EXPERT-NOVICE

DIFFERENCES AND DESIGN OF INSTRUCTION

SCHEMA-BASED APPROACH TO STUDYING

EXPERT PERFORMANCE

The purpose of cognitive studies of human expertise is to identify the

cognitive structures and processes responsible for skilled performance. Expert

performance has been studied in a variety of domains, for example, chess (de

Groot, 1965), physics (Chi, Feltovich, & Glaser, 1981; Larkin, McDermott,

Simon, & Simon, 1980), programming (Anderson, Boyle, & Reiser, 1985) and

radiology (Lesgold, Rubinson, Feltovich, Glaser, Klopfer, & Wang, 1988), to

name just a few. Various techniques and approaches have been applied to find out

the organization of experts' knowledge, the characteristics of their understanding,

information processing requirements and the nature of competency in such areas

as chess (Chase & Simon, 1973; Simon, 1979), geometry (Greeno, 1977),

genetics (Smith & Goodman, 1984), physics (Larkin & Reif, 1976), electronic

troubleshooting (Brown & Duguid, 1989; Forbus & Gentner, 1986; Gitomer,

1988; Lesgold & Lajoie, 1991; Morris & Rouse, 1985; Perez, 1991; Rasmussen,

1986; Swezey, Perez, & Allen, 1988; Tenney & Kurland, 1988; Wiggs & Perez,

1988), and mechanical troubleshooting (de Kleer & Brown, 1983, 1984; diSesssa,

1983; Forbus, 1984; Hegarty, 1991; Hegarty & Just, 1989; Heller & Reif, 1984;

Miyake, 1986; Reif, 1987; Stanfill, 1983; White, 1983; White & Frederiksen,

1986).As discussed in the previous chapter, schemas are a major type of knowledge

representation in long-term memory that reflects prototypical features of objects,


31/116

Slava Kalyuga22

situations, and events. To understand or interpret incoming information, the

human cognitive system matches this information with existing schemas(Rumelhart & Norman, 1983). In general, studies of expert-novice differences

demonstrate that expertise is not so much a function of superior problem-solving

strategies or a better working memory, but rather experts have a better domain-

specific schematic knowledge base.

Chunks have played an important role in the development of the

understanding of expert-novice differences. Since Miller's (1956) finding that

short-term memory is limited to approximately seven units, or chunks, of

information, a chunk has served as a unit of measurement for memory capacity. A

chunk can be considered as a generalized example of a schema. De Groot (1965;

1966) was one of the first psychologists who investigated expert-novice

differences and demonstrated that expertise can be explained by the enormous

amounts of knowledge that experts can access. In his classic studies, chess players

had to reconstruct the positions of chess pieces on a board, after a brief exposure

(5 seconds). De Groot's findings that chess masters could recall many more pieces

from briefly exposed real chess positions than novices was explained by masters

having larger chunks. Chase and Simon (1973) noticed that experts placed chess

pieces on the board in groups that represented meaningful configurations. Theexperts did not show superior performance when random placements of the chess

pieces were used.

Egan and Schwartz (1979) studied expertise in electronics with a

methodology similar to that used by Chase and Simon (1973) in studying chess

expertise. They found that experts could reconstruct large circuit diagrams from

memory recalling them in chunks of meaningfully related components. The

experts were better than novices at recalling meaningful (not random) circuit

diagrams. The size, rather than number, of recalled chunks increased with study

time. Chase and Ericsson (1982) further suggested that the superior memory ofchess masters and other experts was due to possession of schema structures with

specific slots filled in with the index information that served as retrieval cues. The

material could be recalled by reading out the contents of these slots and selecting

schemas that corresponded to familiar stimuli.

The schema-based approach was successfully used to explain various

phenomena related to expert performance and differences between experts and

novices (Chi et al., 1981; Reimann & Chi, 1989). For example, in the domain of

physics, experts' categories were based on the principles of mechanics

(conservation of energy and momentum, etc.), whereas novices' categories were based on objects and surface features stated in each specific problem (incline

plane, spring, etc.). In the case of an object being balanced on an inclined plane,


32/116

Cognitive Studies of Expert-Novice Differences and Design of Instruction 23

the experts saw it as an example of a class of problems requiring a balance-of-

forces approach, while novices saw it as an inclined planes problem type. Thefailure of a novice to solve this problem may result from the fact that different

incline plane tasks may require different approaches (based on balance of forces,

energy conservation, etc.), and the presence of the incline plane alone does not

determine the appropriate approach.

One of the reasons for novices' difficulties in problem solving is that they

activate only lower-level schemas that incorporate only surface aspects of the

problem, whereas experts activate higher-level schemas that contain information

critical to the problem solution (Chi & Glaser, 1985). Thus, experts categorize

problems in terms of deep structures such as the laws used to solve the problems,

while novices categorize problems based on surface structures such as common

physical attributes. The same problem may elicit different schemas for experts

than for novices.

Schematic knowledge structures in long-term memory effectively provide

necessary executive guidance during high-level cognitive processing (Sweller,

2003). Without such guidance and in the absence of external instructions, people

usually resort to random search or weak problem-solving methods such as means-

ends analysis (a gradual reduction of differences between current and goal problem states). Such methods are cognitively inefficient and time consuming.

They may impose a heavy working memory load interfering with construction of

new schemas (Sweller, 1988).

In contrast, when experts in a domain encounter a familiar problem situation,

they rapidly retrieve appropriate previously acquired schemas from long-term

memory and apply them in a cognitively efficient way (Chi, et al., 1981; Larkin,

et al., 1980). Schemas allow them to categorize different problem states and

decide the most appropriate solutions. Due to their available knowledge base in

long-term memory, experts are able to avoid cognitively inefficient mentalactivities and perform with greater accuracy and lower cognitive loads.

Schematic knowledge structures can be described functionally by indicating

how a person with a specific level of a schema acquisition would act in relevant

problem situations. For example, without any schematic knowledge of procedures

for solving the equation 4x + 2 = 3 and in absence of any guidance, a student will

treat each symbol separately and may try to use a means-ends analysis approach

by reducing differences between a current problem state and the goal state (x = ?)

or attempt to apply various random operations to the numbers.

With some previously acquired knowledge of an appropriate procedure,another student may immediately proceed to subtract the coefficient 2 from both

sides of the equation: 4x + 2 – 2 = 3 – 2. The whole combination of elements (e.g.


33/116

Slava Kalyuga24

4x + 2) will be treated as a meaningful single unit or chunk. If a student practiced

considerably with this kind of equations, the schema for this procedure may beautomated and her or his first solution step will be 4x = 1. Another, even more

experienced student may have all the relevant solution procedures well learned or

automated and would write the final answer (x = 1/4) almost immediately.

Similar examples of expert-novice differences could be demonstrated in other

areas. Each symbol in a wiring diagram could be treated as a separate element by

a novice electrician, while an experienced professional would see the whole

diagram as representing a complete system. For a foreign language non-speaker, a

printed text might look as a collection of unfamiliar symbols, while fluent native

readers would be able to make sense out of the whole text. They would treat

words or even combinations of words as single elements.

By combining multiple elements of information into a single chunk in

working memory, long-term memory schemas allow experts to avoid processing

overwhelming amounts of information and to effectively reduce working memory

load during high-level cognitive processing. In addition, experts are also able to

bypass working memory limitations by having many of their schemas highly

automated due to extensive practice. Human cognitive architecture has evolved in

a way that information processing changes significantly as this information becomes more familiar to an individual (Sweller, 2003). Schematic knowledge

structures held in long-term memory significantly influence the content and

characteristics of working memory by effectively transforming it into long-term

working memory (Ericsson & Kintsch, 1995).

An expert’s routine problem solving in a familiar domain usually involves a

selection of an appropriate schema, adapting it to the problem, and executing the

solution procedure. Often it occurs as a direct recognition early in the perception

of the problem (Chi, Feltovich, & Glaser, 1981). Non-routine problem solving

includes additional procedures such as search (when more than one schema isapplicable to the situation) or combining the schemas (when no one schema will

cover the whole problem) (Larkin, 1985). Substantial evidence has accumulated

that a schema theory of problem solving can be successfully used to explain

experts' performance in various task domains (Reimann & Chi, 1989).

Building a problem representation is a key process in problem solving

(Larkin, 1985; McDermott & Larkin, 1978, Simon & Simon, 1978). It has been

found that experts spend more time on a qualitative analysis of the problem and

building explicit representations of the situation (for example, by drawing the

diagrams of causal relationships between the objects). Experts also form moreabstract and enriched representations than novices do. For example, according to

Chi, Feltovich, and Glaser (1981), experts classify physics problems based on


34/116


abstract physics categories and principles, while novices do it according to surface

characteristics of the problem. Thus, the level of problem representation dependson the solver's problem schemas. An initial cue (first sentences in the problem

statement, etc.) may activate a particular schema that is then matched to the

problem. Any mismatch results in the rejection of that schema and triggering of

another schema.

Successful problem solving in technical domains depends on the solver's

schemas for the causal relations between components of a technical system which

allow mental simulations of the system operation (de Kleer & Brown, 1983;

Gentner & Stevens, 1983; Miyake, 1986). Providing learners with a causal

description of a device’s operation in addition to information about its

components was shown to enhance their ability to operate the device (Kieras &

Bovair, 1984; Mayer, 1989a).

Different types of schemas are appropriate for solving different types of

problems. At higher levels of skill, the choice of schematic knowledge types is

determined by higher level structures in which an expert's representations are

organized (Hegarty, 1991). Initially, problem schemas are specific to the

situations from which they were induced. With experience, they become indexed

by the general principles and problem solving becomes faster and takes less effort.Organization of the solvers' knowledge into large groups of chunks or schemas

decreases the demands on working memory and allows learners to activate

appropriate procedures. As soon as experts retrieve a problem schema, they

automatically access the procedures for solving the problem (Chi et al., 1981;

Smith, 1991).

The development of a problem representation can be viewed as the sequential

attempts of schema refining, which depends on the structure of the domain-

specific knowledge of the solver. This results in experts spending more time on

planning and using forward-working and efficient problem-solving processes(Reimann & Chi, 1989). Empirical studies in various domains have revealed that

problem-solving strategies are determined by the nature of the problem

representations, differences in the organization of knowledge, and the number of

domain-specific problem schemas that solvers have because of their experience in

a domain (Larkin, 1985; Lesgold, Feltovich, Glaser, & Wang, 1981).

Experts’ performance is schema-driven. Experts possess more domain-

specific schemas and can access and use them more efficiently than novices.

Experts work forward deriving the appropriate problem schema from the problem

statement. In contrast, novices’ performance is goal-driven. Novices work backward from the goal, searching for operators that will allow them to derive the

needed solution. However, working backwards is a default strategy that both


35/116

Slava Kalyuga26

experts and novices use when there is no schema for a given type of problems. In

a novel situation, experts use various types of general heuristics together withdomain-specific knowledge (Perkins, Schwartz & Simmon, 1991; Rist, 1989;

Schultz & Luchheud, 1991).

Thus, expert performance depends on available problem representations,

knowledge base (facts, concepts, principles, knowledge of a system and rules how

to use this knowledge), availability of appropriate domain-specific schemas,

general procedures (strategies, heuristics, algorithms), and relations among all

these elements (Hart, 1986; Lesgold and Lajoie, 1991). According to Chi, Glaser,

and Farr (1988), the main features of competent expert performance are:

1) domain-specificity (experts exhibit superior performance mainly in their

own domains);

2) perception of problem situations by large meaningful patterns;

3) high speed of performance;

4) superior well-organized long-term memory knowledge base;

5) deep-level and principle-based problem representations;

6) thorough qualitative analysis of problems; and

7)

strong self-monitoring skills.

COGNITIVE STUDIES OF EXPERT-NOVICE DIFFERENCES

AND INSTRUCTIONAL APPROACHES

Most studies of expertise have focused on discrete expert-novice differences

in solving specific tasks. Existence of a continuum between novices and experts

has been frequently ignored. As a result, our knowledge about the development of

expertise and about changes in cognitive processes as expertise is acquired islimited. Groen and Patel (1991) suggested four developmental levels: 1) novices

with no training in the domain (possessing only common sense knowledge and

everyday experience); 2) intermediates who have received some instruction in the

domain; 3) sub-experts who have expertise in a closely related domain (they may

also be viewed as intermediates); and 4) experts who are always correct in solving

routine problems and solve them by way of forward reasoning. It is impossible for

novices to learn expert approaches directly. When expert rules are taught to

beginners, they form isolated pieces of knowledge that are not retained for a long

period of time (Groen & Patel, 1991). Thus, an existing theory of expert performance cannot be applied directly to instruction, and theoretical models of

student transition from one level to another should be developed.


36/116


Expert routine problem solving is traditionally associated with using a

forward-working strategy; novices tend to work backward. In the case ofunfamiliar problems experts also use backward reasoning. The studies of Sweller

and his colleagues (Mawer & Sweller, 1982; Sweller & Levine, 1982; Sweller et

al., 1983) brought some understanding of when the switch occurs during the

development of expertise and what factors would facilitate the switch. It was

demonstrated that means-ends analysis might prevent the acquisition of problem-

specific rules because this method could leave no cognitive resources available for

meaningful learning.

Rule acquisition occurred or improved under conditions where subjects were

provided with information additional to the problem goal (for example, a set of

subgoals) or were given goal-free problems. Sweller et al., (1983) hypothesized

that the main factor responsible for this result was the kind of information a

learner focuses on during problem solving. If knowledge or schema acquisition is

an aim of problem solving, then the influence of the goal as a control mechanism

should be reduced.

In some studies, forward reasoning intermediate level medical students

performed more poorly then either experts or novices (Groen & Patel, 1991). This

result was explained by their dogmatic reliance on existing basic scienceknowledge. When students' knowledge contains misconceptions, forward

reasoning might be harmful for learning. If they reasoned backward, then the

misconceptions would be just temporary hypotheses. It was suggested that in such

cases an emphasis should be placed on self-explanations and testing their

adequacy (explanation-based learning) rather than on correct problem solving

(Groen & Patel, 1991).

Most of the experimental evidence in the area of expert-novice differences

was obtained by contrasting performance of experts and novices. Schoenfeld and

Hermann (1982) conducted one of the first longitudinal studies of the relationship between problem perception and expertise. Students' perceptions of mathematical

problems were examined before and after intensive training in mathematical

problem solving. It was demonstrated that novices sorted problems based on

surface components mentioned in the problem statement. After the training, they

sorted them in a more expert-like way according to the principles of problem

solution. Thus, problem perception and problem schemas on which such

perception is based changed as learners became more experienced in the domain.

With the development of expertise, problem schemas change in their level of

specificity (diSessa, 1983; Forbus & Gentner, 1986; Kaiser, Jonides, &Alexander, 1986). Initially induced from specific situations, they become more

general and indexed by the underlying principles (Chi et al., 1981). At higher


37/116

Slava Kalyuga28

levels of development, schemas may also change from qualitative to quantitative

representing relationships between components of problem situations more precisely (Forbus & Gentner, 1986; Hegarty, Just, & Morrison, 1988). As people

gain more experience with technical systems, they learn relations between their

common subsystems and learn to chunk components of systems into these

subsystems (Hegarty, 1991). New information is then assimilated into existing

sophisticated knowledge structures.

The learning mechanisms and strategies evolve as a learner becomes more

experienced (Langley & Simon, 1981). Lesgold et al. (1988) hypothesized that

early learning is perceptual and different from later cognitive learning. Experts

use schemas to interpret incoming information, intermediates often reshape their

perceptions to fit the schema, whereas novices completely rely on their

perceptions. The previously mentioned decline in performance at intermediate

levels can also be due to the shift from perceptual learning to cognitive schema-

based learning.

According to the triarchic/global/local architecture of expert cognition

(Sternberg & Frensch, 1992), when processing information from new domains, an

expert relies mostly on controlled, global processing. If information belongs to the

expert's narrow area of expertise, she or he relies mostly on automatic, local processing. Such local processing systems can operate in parallel, be automated,

and characterized by almost unlimited processing capacity. As expertise develops,

learned portions of processing procedures are transferred to a local processing

system. This enables experts to automate more processing and thus to free global

processing resources for dealing with new situations (Sternberg & Frensch, 1992).

However, experts may be inflexible in new situations because it is difficult to

reorganize an automated schema. Experiments with bridge players confirmed that

experts were more affected when new task demands required changing deep,

abstract principles rather than surface features. Novices were more affected bysurface changes than by deep, abstract changes (Sternberg & Frensch, 1992).

Nevertheless, Schraagen (1993) demonstrated that when domain-specific

knowledge is missing, experts could still maintain a more structured approach

than novices could by making use of more abstract high-level knowledge.

According to the theory of skill acquisition (Anderson, 1983), the instruction

in specific performance procedures must be preceded by the instruction in the

concepts, rules, and principles of how things work (declarative knowledge). In

addition to the theoretical principles, the ability to apply them in concrete

situations should be developed (Morris & Rouse, 1985). A procedural approachonly is not sufficient, because it is impossible to predict all possible situations in

advance, especially in complex domains like modern digital electronics. Thus,


38/116


training should combine knowledge of system principles with procedures of how

to use this knowledge in a specific context. In general, teaching expert performance might require a basic conceptual explanation of how things work,

practice in carrying out basic procedures, and variation in experiences for tuning

of procedural knowledge and the development of persistence and confidence

(Gentner & Stevens, 1983; Greeno & Simon, 1988).

Kieras and Bovair (1984) demonstrated that providing students with

conceptual models of a complex system prior to information on how to use that

system produced better recall, faster learning, and fewer errors in the operation of

the system. Combined structural and functional descriptions of system operations

are recommended for effective learning (Psotka, Massey, & Mutter, 1988).

However, specific instructional strategies should be based on the cognitive

requirements of particular tasks. The user does not always need a complete

knowledge of the system in order to be able to operate it.

For example, many experts in technical areas have a very limited

understanding of general physics principles but satisfactorily perform their duties.

If a device is simple, or a procedure is easily learned and practiced (e.g., a

telephone) there may be no need to provide a device model. The user may infer a

usable model without instruction (Kieras & Bovair, 1984). Limited underlyingknowledge and understanding of how certain functions are fulfilled are required

for operating and troubleshooting systems with simple functions. For more

complex systems, a deeper understanding of their components and operation is

required (Lesgold & Lajoie, 1991).

Novices often have difficulties integrating general theoretical concepts with

their intuitions because of conflicts between everyday meanings of new concepts

(e.g., acceleration, mass) and their meaning in theory (Reif, 1987), conflicts

between students' intuitive knowledge and theoretical laws (diSessa, 1982), or

because of the lack of procedural knowledge of solving specific problems that isoften not explicitly taught (Heller & Reif, 1984).

There have been two major approaches in using the results of cognitive

research on knowledge structures in the design of instructional systems (Glaser,

1990). The first approach has been developed in the tradition of knowledge

engineering in artificial intelligence and design of expert systems. It requires

exposing the learner to the knowledge characteristics of well- developed

expertise. The well-known example of a computer-based instructional system

designed in accordance with this approach is the GUIDON project (Clancey &

Letsinger, 1984).The second approach has been developed in cognitive science and is based on

cognitive models of students' knowledge. For example, in instructional systems


39/116

Slava Kalyuga30

based on qualitative models (Chi, 1988; Forbus & Gentner, 1986), a learner has to

progress from simple to more sophisticated domain-specific conceptual models(e.g., coordinated functional, causal, and structural models; qualitative and

quantitative models). This progression occurs in the context of solving

specifically designed problems with gradually increasing levels of complexity. An

example of this approach is the program for teaching troubleshooting of electric

circuits QUEST (White & Frederiksen, 1986).

Similar ideas were realized in the STEAMER project (the simulator for

training engineers to operate steam propulsion plants aboard large naval ships).

The primary goal was to teach a robust conceptual model (rather than specific

procedures) that could be used to reason about the steam plant qualitatively

(Holland, Hutchins, McCandless, Rosenstein, & Weitzman, 1987). Abstract

graphic images of the steam plant were organized in a hierarchical manner with

the major plant parameters presented first, followed by more detailed simulations

of subsystem components.

SHERLOCK is an example of a coached-practice learning environment in

which learners compare their own performance with expert performance (Gabrys,

Weiner, & Lesgold, 1993; Lesgold and Lajoie, 1991). Such reflection, however,

may place a large demand on working memory, if solution paths are long orcomplicated. SHERLOCK supports reflection by a replay of the trainee's and an

expert's performance. During replay, the system provides a summary of the

information the user has obtained on previous steps. The system allows learners to

observe the expert's decision process, reasons behind it, and the overall goal

structure for the expert performance. This technique reduces the cognitive load

associated with remembering the details of trainee's own performance while

observing the expert's actions (Gabrys et al., 1993).

Another well-known example of a similar approach is the model-tracing

methodology in intelligent tutoring systems (Anderson, 1993). The tutoringsystem simulates a student’s cognitive behavior in real time and maintains a

model of the student's knowledge state. It provides an example-based learning

environment in which students can induce rules from examples. The learner's

actual performance is compared to the ideal structure of solution (production rules

model), and the student is kept on the correct solution path. The tutor estimates

the availability of acquired productions based on their correct and incorrect

applications and selects appropriate problems for exercises. Many tutoring

programs based on the model-tracing methodology have been effectively used in

the fields of programming, geometry proofs, solving algebraic equations(Anderson, Boyle, & Reiser, 1985; Anderson & Corbett, 1993; Anderson,


40/116


Corbett, Fincham, Hoffman, & Pelletier, 1992; Anderson, Farrell, & Sauers,

1984).

COGNITIVE MODELS OF DEVELOPMENT OF EXPERTISE

AND INSTRUCTIONAL DESIGN

Cognitive studies of human performance and learning have demonstrated that

learning processes are supported by a basic cognitive architecture that includes a

powerful long-term memory and a limited working memory. Schema acquisition

and automation as the major learning mechanisms are critical in intellectual skills

formation. Studies of chess skills and other domains indicate that our knowledge

base provides the foundation of intellectual skills. Schemas held in long-term

memory allow experts to avoid processing overwhelming amounts of information

in working memory and thus by-pass working memory limitations.

Automatic processin

Cognitive Load Factors.pdf

Documents

Transcript of Cognitive Load Factors.pdf