Cognitive Load Factors.pdf

download Cognitive Load Factors.pdf

of 116

Transcript of Cognitive Load Factors.pdf

  • 8/18/2019 Cognitive Load Factors.pdf

    1/116

  • 8/18/2019 Cognitive Load Factors.pdf

    2/116

     

    COGNITIVE LOAD FACTORS IN

    INSTRUCTIONAL DESIGN FOR

    ADVANCED LEARNERS 

     No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or 

     by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no

    expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No

    liability is assumed for incidental or consequential damages in connection with or arising out of information

    contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in

    rendering legal, medical or any other professional services.

  • 8/18/2019 Cognitive Load Factors.pdf

    3/116

  • 8/18/2019 Cognitive Load Factors.pdf

    4/116

     

    COGNITIVE LOAD FACTORS ININSTRUCTIONAL DESIGN FOR

    ADVANCED LEARNERS 

    SLAVA KALYUGA 

    Nova Science Publishers, Inc. New York

  • 8/18/2019 Cognitive Load Factors.pdf

    5/116

     

    Copyright © 2009 by Nova Science Publishers, Inc.

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system

    or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape,

    mechanical photocopying, recording or otherwise without the written permission of the

    Publisher.

    For permission to use material from this book please contact us:

    Telephone 631-231-7269; Fax 631-231-8175

    Web Site: http://www.novapublishers.com

    NOTICE TO THE READER

    The Publisher has taken reasonable care in the preparation of this book, but makes no

    expressed or implied warranty of any kind and assumes no responsibility for any errors or

    omissions. No liability is assumed for incidental or consequential damages in connection

    with or arising out of information contained in this book. The Publisher shall not be liable

    for any special, consequential, or exemplary damages resulting, in whole or in part, from

    the readers’ use of, or reliance upon, this material.

    Independent verification should be sought for any data, advice or recommendations

    contained in this book. In addition, no responsibility is assumed by the publisher for any

    injury and/or damage to persons or property arising from any methods, products,

    instructions, ideas or otherwise contained in this publication.

    This publication is designed to provide accurate and authoritative information with regard

    to the subject matter covered herein. It is sold with the clear understanding that the

    Publisher is not engaged in rendering legal or any other professional services. If legal or

    any other expert assistance is required, the services of a competent person should be

    sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY ACOMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF

    PUBLISHERS.

    LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA 

    ISBN: 978-1-60741-685-2 (E-Book) 

    Available upon request

    Published by Nova Science Publishers, Inc.              New York

  • 8/18/2019 Cognitive Load Factors.pdf

    6/116

     

    CONTENTS 

    Preface vii 

    Chapter 1 Basic Architecture of Human Cognition 1 

    Chapter 2 Cognitive Studies of Expert-Novice Differences

    and Design of Instruction 21 

    Chapter 3 Cognitive Load Perspective in Instructional Design 35 

    Chapter 4 Cognitive Load Principles in Instructional Design

    for Advanced Learners 69 

    Summary Toward a Cognitively Efficient Instructional

    Technology for Advanced Learners 91 

    Index 99 

  • 8/18/2019 Cognitive Load Factors.pdf

    7/116

  • 8/18/2019 Cognitive Load Factors.pdf

    8/116

     

    PREFACE 

    The empirical evidence described in this book indicates that instructional

    designs and procedures that are cognitively optimal for less knowledgeable

    learners may not be optimal for more advanced learners. Instructional designers or

    instructors need to evaluate accurately the learner levels of expertise to design or

    select optimal instructional procedures and formats. Frequently, learners need to

     be assessed in real time during an instructional session in order to adjust the

    design of further instruction appropriately. Traditional testing procedures may not

     be suitable for this purpose. The following chapters describe a cognitive load

    approach to the development of rapid schema-based tests of learner expertise. The

     proposed methods of cognitive diagnosis will be based on contemporary

    knowledge of human cognitive architecture and will be further used as means of

    optimizing cognitive load in learner-tailored computer-based learning

    environments.

  • 8/18/2019 Cognitive Load Factors.pdf

    9/116

  • 8/18/2019 Cognitive Load Factors.pdf

    10/116

     

    Chapter 1

    BASIC ARCHITECTURE OF HUMAN

    COGNITION 

    A cognitive approach to human learning emphasizes the internal cognitive

    mechanisms of learning. Such mechanisms are usually described as

    transformations performed on various mental representations of situations and

    tasks. An important assumption of the approach is that a single general cognitivesystem underlies human cognition. Different theoretical approaches specify this

    general cognitive system as corresponding cognitive architectures. The

    understanding of human cognition within a cognitive architecture requires

    knowledge of corresponding models of memory organization, forms of knowledge

    representation, mechanisms of problem solving, and the nature of human

    expertise.

    MEMORY ORGANIZATION 

    The major characteristics of human memory are its strength or durability,

    capacity (number of items of information stored in memory), and speed of access.

    According to these characteristics, memory is divided into long-term memory and

    short-term memory. Long-term memory (LTM) is characterized by high strength

    and includes well-learned knowledge, for example, the name of the first US

    President, 5 x 5 = 25, or the spelling of the word  potatoes. It is presumed to have

    unlimited capacity, although the access to the stored information could be slow.

    Both the strength of memory and the speed of access increase with practice. More

  • 8/18/2019 Cognitive Load Factors.pdf

    11/116

    Slava Kalyuga2

    fully elaborated and more deeply processed material results in better long-term

    memory. 

    Short-term memory (STM), on the other hand, includes information that has

     been just encoded from sensory registers or retrieved from long- term memory,

    for example, what have you been thinking about just before this? what are you

    thinking about when dialing the phone number 8344 2124?. The durability of

    STM is a matter of seconds (Peterson & Peterson, 1959), and information in STM

    could be accessed very rapidly. The number of items of information that can be

    maintained in an active state simultaneously in STM is about seven units for most

     people (Miller, 1956). For example, it is very difficult for us to recall more than

    approximately seven serially presented random numbers (e.g., an unfamiliar

     phone number) a few seconds after we hear or see them, unless the numbers have

     been intentionally rehearsed. When asked to copy strings of digits from one page

    to another, we usually do this by grouping the digits by easily manageable units of

    three or four at a time.

    The most generally specified basic human cognitive architecture includes

    these two substructures (STM and LTM). Examples are the standard model 

    (Newell & Simon, 1972) and modal model (Atkinson & Shiffrin, 1968; Waugh &

     Norman, 1965). In more specific models, these substructures might be regardedeither as a single memory store with different modes of activation for long-term

    and short-term components, or as separate memory stores. These distinctions are

    not essential when considering the basic level of cognitive architecture. However,

    in order to explain human cognition, this general model needs to be supplemented

     by some attention control mechanism (central processor or central executive)

    which determines what information from sensory stores or LTM is brought into

    STM. The information that is actually attended to is limited to a small number of

    chunks in STM (Simon, 1979; Ericsson & Simon, 1993a, 1993b).

    Various cognitive architectures and elaborations of the general model extendthe described memory structure. For example, the concept of working memory

    (WM) was introduced to account for processing of units of information that are

    interconnected, rather than random, and should be processed concurrently because

    of the nature of things they reflect or due to established associations in long-term

    memory. Working memory is considered as "a system for the temporary holding

    and manipulation of information during the performance of a range of cognitive

    tasks" (Baddeley, 1986, p. 34), a “desktop of the brain … that keeps track of what

    we are doing or where we are moment to moment, that holds information long

    enough to make a decision, to dial a telephone number, or to repeat a strangeforeign word that we have just heard” (Logie, 1999, p.174). Some simple

    examples of working memory operation could be provided by the following tasks:

  • 8/18/2019 Cognitive Load Factors.pdf

    12/116

    Basic Architecture of Human Cognition 3

    close your eyes and pick up a pen in front of you; count the number of windows in

    your house or apartment; mentally rearrange the furniture in your room, ormentally complete a mathematical operation (for more examples, see Logie,

    1999).

    After incoming stimuli from an external source are registered in sensory

    memory, perceived or matched to recognizable patterns by using prior knowledge

    (if any) in LTM and context, and are paid attention to, they are transferred into

    WM. If a unit of information is not recognized due to the lack of appropriate LTM

     patterns, it still could be attended to and processed in WM, with appropriate

    cognitive resources allocated for the task. Attended units of information in WM

    are assigned meaning and used for constructing integrated mental representations

    of a situation or task (Figure 1). This information, however, may fade very

    quickly if attention is diverted or if the capacity of WM is overloaded.

    Baddeley and Hitch (1974) first proposed that WM performed both

     processing and storage functions. They suggested three structural components of

    working memory: a central executive and two separate auditory and visual stores

    for handling verbal information and visual images. These two stores serve as

    maintenance systems controlled by the central executive and are called

    respectively an articulatory or phonological loop (‘inner voice’) and a visuospatialsketchpad (‘inner eye’). The limited capacity of the central executive is used for

     processing incoming information, with the remainder used for the storage of

    intermediate and final products of that processing. Storage and processing

    capabilities of WM trade off against each other. When memory load increases

    above some threshold, our performance could be inhibited. To get a feeling of

    WM limitations, try to mentally add two large numbers (for example, 83 468 437

    and 93 849 040). For a concurrent task, you may try also to attend simultaneously

    to a comedy show on your TV. It would be very difficult to do because each of

    these activities alone may take all of your WM resources.There are three major functional aspects of working memory operation:

    temporary storage, manipulation of information, and executive control.

    Temporary storage of information was the focus of classic models of STM and

    was studied using standard word or digit STM span tasks. These were simple

    tasks involving recalling a list of digits or unrelated words and not requiring much

     prior knowledge. Active manipulation of information has been the focus of

    models of WM and has been studied using WM span tests that require concurrent

     processing of several tasks. These are relatively more complex tasks involving

    meaningful cognitive operations such as reading sentences or performingnumerical transformations, and then recalling the final words of those sentences or

    results of the math operations. Performance of complex cognitive tasks requires

  • 8/18/2019 Cognitive Load Factors.pdf

    13/116

    Slava Kalyuga4

    simultaneous use and integration of various sources of information, coordination

    of separate processes and representations. It is the executive functioning of WM,interactions between WM and LTM knowledge structures that have become the

    focus of research in recent years (see Miyake & Shah, 1999, for a recent overview

    of WM models and the state of the field).

    A number of hypotheses have been proposed to explain individual differences

    in WM capacity and its relation to performance. These theories considered

    differences in total WM capacity, differences in processing efficiency of WM, or

     both. According to the total capacity approach (Baddeley & Hitch, 1974; Cantor

    & Engle, 1993; Case, 1985; Engle, Cantor, & Carullo, 1992), all cognitive

     processes require resources from a fixed pool. Any resources not allocated to the

    operations can be used for short-term storage. The storage and processing

    capabilities of working memory trade off against each other. When memory load

    increases above some threshold, a person’s performance may decline. A change in

    total capacity caused, for example, by fatigue or age should affect the

     performance in a wide range of tasks.

    Constructing mentalrepresentations of asituation or task

      Long-TermMemory

    Knowledge base

    Working Memory

      Sensory Memory:  Incoming information

     

    Figure 1. Basic architecture of human cognition.

  • 8/18/2019 Cognitive Load Factors.pdf

    14/116

    Basic Architecture of Human Cognition 5

    The task-specific hypothesis (Daneman & Carpenter, 1980) assumed that

    WM capacity is specific to the particular task being performed. Efficient processing skills leave more WM capacity for storage of processing products. A

    change in processing efficiency should be specific to a particular task and result

    from intensive practice or training (Just & Carpenter, 1992). Performance would

     be influenced only if available resources are in short supply when a person

    operates at the limit of WM capacity. The processing efficiency approach assumes

    that a single central system is responsible for the processing and temporary

    storage of information. Its limited capacity must be shared between the processing

    and the storage demands. Individuals with inefficient processes have a

    functionally smaller storage capacity because they must allocate more resources to

    the processes (Daneman & Carpenter, 1983; Daneman & Tardif, 1987).

    Working memory capacity was measured in terms of operational capacity

    dependent on the type of specific background task used in a particular domain

    (Carpenter & Just, 1989). For example, the reading span test was used to measure

    WM capacity as the largest size of the set of simple sentences from which a

    subject can reliably recall the final words of all the sentences (Daneman &

    Carpenter, 1983). Daneman and Tardif (1987) established that the reading span

    was a measure specific to the language skills, not a measure of general workingmemory capacity, and it correlated significantly with reading comprehension

    ability.

    Although there obviously are systematic differences among individuals in

    their working memory capacity for specific tasks, and these differences influence

     performance when the person operates at the limit of his or her working memory

    capacity, no single approach or hypothesis concerning the interpretation of

    individual differences in WM capacity has received convincing empirical support.

    Such differences could be strongly influenced by knowledge structures available

    in long-term memory. Any WM span implicitly reflects an individual's knowledgeand experience in a domain, and this knowledge inevitably influences his or her

     performance in both processing and storage parts of the task (e.g., Hulme,

    Maughan, & Brown, 1991; Hulme, Roodenrys, Brown, & Mercer, 1995). WM

    span measures thus could be used as predictors of the person’s performance in the

    corresponding domain rather than measures of his or her true general WM

    capacity. It is practically impossible to eliminate the influence of the person’s

    knowledge base when meaningful tasks are involved in WM span tests. From this

     point of view, approaches that focus on connections between the content and

    operation of working memory and long-term memory could be more relevant and productive.

  • 8/18/2019 Cognitive Load Factors.pdf

    15/116

    Slava Kalyuga6

    Simple chunking mechanisms provide an example of using long-term

    memory structures in transforming the content of working memory. The chunk isa familiar unit of information based upon previous learning. For example, it could

     be difficult to remember and recall a string of random letters like

    B,B,C,C,I,A,A,B,C,F,B,I, unless we chunk them together into BBC, CIA, ABC,

    FBI. In this case, we use our prior knowledge stored in LTM to reduce the number

    of elements to a manageable four chunks. The same method could be used with

    the following string of numbers: 1,9,1,4,1,9,4,5,1,9,9,6,2,0,0,1. Another common

    example of chunking in language comprehension is the way we chunk letters into

    familiar words, and words into familiar phrases. An STM capacity estimate of

    around seven units (Miller, 1956) actually indicates the number of chunks rather

    than total amount of information stored in STM. This mechanism explains how

    we manage to get around the information-processing bottleneck created by our

    limited working memory capacity, and to learn the enormous amount of

    knowledge in our LTM.

    People can be trained to effectively increase their memory capacity to an

    amazing degree through extensive training in chunking and re-chunking

    information into meaningful units using their prior knowledge stored in LTM. The

    skilled memory theory (Chase & Ericsson, 1982) claims that people developmechanisms that enable them to use a large and familiar knowledge base to

    rapidly encode, store, and retrieve information within the area of their expertise

    and thus circumvent the working memory capacity limitations. As a result, experts

     possess an enhanced functional working memory capacity in domains of their

    expertise (Ericsson & Staszewski, 1989).

    Available domain-specific knowledge enables experts to quickly encode and

    retain large amounts of information in LTM. Such LTM storage and retrieval

    operations speed up with practice and are comparable with STM encoding and

    retrieval, resulting in experts' superior task performance and superior recall forfamiliar materials (the skilled memory effect; Ericsson & Staszewski, 1989). For

    example, expert mnemonists can increase their digit spans far beyond the limit of

    Miller's seven plus-or-minus two digits. They use familiar chunks of knowledge

    in LTM to encode new information in an easily accessible form. Ericsson and

    Staszewski (1989) described a person who expanded his digit span to 84 digits by

    grouping them into short sequences and encoding them in terms of, familiar to

    him, athletic running times, dates, and ages. He nevertheless operated under the

    constraints of limited-capacity STM: the size of digit groups never exceeded five

    digits, and these groups never were clustered in supergroups with more than fourgroups in a supergroup.

  • 8/18/2019 Cognitive Load Factors.pdf

    16/116

    Basic Architecture of Human Cognition 7

    In the WM model of Carpenter and Just (1989), the operation of WM during

    reading comprehension is also based on relations between WM and LTM. In thismodel, WM consists of currently active pointers to LTM structures and partial or

    final products of processing. A reader stores the theme of the text, the general

    representation of the situation, the major propositions from preceding sentences,

    as well as a representation of the sentence he or she is currently reading (Just &

    Carpenter, 1992). When dealing with an unstructured series of words, we can

    usually recall only six or seven unrelated words in order (according to our STM

    span). Skillful readers, on the other hand, can recall and understand long

    sentences (about 77% of words in up to 22-word sentences) because they use

    internal structures in LTM to circumvent WM limitations. Thus, sentence

    comprehension can be considered as recoding (chunking) incoming symbols into

    some structure (Carpenter & Just, 1989).

    Ericsson and Kintsch (1995) further developed these ideas into the theory of

    long-term working memory (LT-WM). In this theory, LTM knowledge structures

    associated with components of working memory form a LT-WM structure that is

    capable of holding virtually unlimited amount of information. Some additional

    mechanisms were introduced for overcoming the effects of interference in experts'

    use of LTM knowledge for storage and retrieval of newly encoded informationwere introduced. The proposed mechanism of LT-WM operation involves cue-

     based retrieval of information from LTM. The encoding method can be based on a

    specifically constructed retrieval structure, an elaborated existing memory

    structure, or a combination of the two. Skilled performance depends on domain-

    specific knowledge structures relevant to particular tasks, and, consequently, there

    are individual differences in the operation of LT-WM for a given task (Ericsson &

    Kintsch, 1995).

    KNOWLEDGEREPRESENTATIONS 

    Our knowledge base in LTM profoundly influences cognitive processes in

    most situations. Therefore, forms of knowledge representations are critical for

    understanding human cognition. Several major ways of representing the meaning

    of information in memory have been suggested: propositional representations

    (semantic networks), procedural representations (production systems), and

    schemas. Analogical representations or mental models (Rumelhart & Norman,

    1983) can be generally considered as schemas. The concept of a  proposition denotes the primitive unit of meaning, or a smallest unit of knowledge about

    which it is possible to make the judgment, true or false. Networks of such

  • 8/18/2019 Cognitive Load Factors.pdf

    17/116

    Slava Kalyuga8

    interconnected units can be used to represent the meaning of sentences and

     pictures. 

     Newell and Simon (1972) suggested that knowledge could be represented by

    a set of conditional rules or productions condition→ action. The production rules

    are stored in long-term memory and are retrieved and used in working memory.

    The current contents of working memory are matched against the conditions of all

    the production rules in long-term memory. Whenever the conditions of a rule

    occur in working memory, the rule is triggered and its action is carried out. Action

    of the rule can change the contents of working memory and determine which rule

    is triggered next. Thus, the principles determining how one rule is followed by

    another are built into the rules themselves.

    One of the most advanced theories based on the idea of production rules, the

    ACT* theory (Adaptive Control of Thought; Anderson, 1983), or its updated

    version ACT-R (R for rational; Anderson, 1993), suggest a separate type of long-

    term memory for production rules (for skills) in addition to the declarative

    memory (propositions, images, and other representations for facts and

    experiences). The items in these memories can vary in their degree of ‘activity’. If

    the contents of working memory match more than one rule in procedural memory

    then whichever is the most active is triggered.The concept of a schema, originally discussed by Bartlett (1932), came into

    cognitive psychology from research in artificial intelligence (Minsky, 1975;

    Bobrow & Winograd, 1977). Schemas generally represent the object as a set of

    attributes (slots). Schemas abstract generalizations about objects from specific

    instances, encode general categories and typical features. They may include not

    only propositions, but also perceptual features (for example, spatial images) and

    stereotypic sequences of events. Schemas may have slots with fixed or variable

    values; slots with variable values usually have some default or most probable

    values.The most important features of schemas are stable patterns of relationships

     between variables (slots). Each schema contains information about some class of

    structures. When particular values are assigned to slots of a schema, a schema-

     based knowledge structure could be obtained in the form of concepts,

     propositions, etc. The obtained knowledge structures could be more general or

    more specific depending on those values. Multiple schemas can be linked together

    and organized into sophisticated hierarchical structures where one schema can

    form part of a more complex schema.

    Schemas may represent knowledge of all kinds and levels: from individualletters (allowing us to recognize different variations of handwritten letters) to

    complex electronic or organizational systems, behavioral patterns, visual and

  • 8/18/2019 Cognitive Load Factors.pdf

    18/116

    Basic Architecture of Human Cognition 9

    auditory perceptual images. For example, our schema for a human face includes

    slots for eyes, a nose, a mouth, ears, etc. These components are arranged in acertain configuration that is not a rigid one. However, some general requirements

    should be met: the nose and eyes should be located above the mouth; eyes should

     be located above the nose on different sides of it, etc. This general schema allows

    us to recognize instances of human faces in limitless situations, including some

     peculiar forms of visual arts.

    A student’s schema for solving linear algebraic equations of the type ax = b

    may include three slots: 1) a number b on the right hand side of the equation; 2) a

    number a  on the left hand side of the equation; and 3) the division operation:

    divide the content of the first slot on the content of the second slot. For less

    experienced students, the schema may include the operation of dividing both sides

    of the equation on the same number a. In this case, the schema would contain

    slots for both parts of the equation, the dividing number a, and the division

    operation.

    For an example of higher-level schematic knowledge representations,

    consider the technical domain that includes knowledge about various technical

    objects (e.g., tools, devices, machines, technological procedures). This variety of

    knowledge in any technical area could be represented with different levels ofspecification: from descriptions of general features to specific details. A

    schematic framework for representing knowledge about a technical object may

    include three main interconnected components that could be referred to as

    functional, operational, and structural descriptions. Any technical object could be

    characterized by some functions or purpose it was designed for (what is this

    object for?), processes utilized in the object’s operation (how does it operate?),

    and the object’s internal structure including links between its components (what

    does it consist of?). To explain an object’s operation means to explain why a

    given set of linked parts performs specific functions utilizing certain processesduring operation. A learner should establish connections between functional,

    operational, and structural components of the object’s description in order to

    understand how it works (Kalyuga, 1984; 1990).

    Gruber and Russell (1996) suggested similar classes of an artifact description:

    structure (the physical and/or logical composition of an artifact in terms of the

    composition of parts and connection topologies), behavior (something an artifact

    might do in terms of observable states or changes), function (effect or goal to

    achieve by artifact behavior), requirements (prescriptions concerning the

    structure, behavior, and/or function that the artifact must satisfy), and objectives(specifications of desired properties of the artifact other than pure functions, such

  • 8/18/2019 Cognitive Load Factors.pdf

    19/116

    Slava Kalyuga10

    as cost and reliability). Requirements and objectives could be generally included

    into the functional description (as functional requirements and general functions).

    functions of the object

    alternativecombinations ofprocesses realizinga set of functions

    alternative technicalsolutions realizing acombination ofprocesses

     

    Figure 2. General schematic structure of technical knowledge.

    Each of above aspects of technical knowledge may have different levels of

    generalization. It is possible to describe an object in very general terms (a global

    level or general overview) or in more details with different levels of specification.

    When combined together, all aspects, components, and levels of the description of

    a technical object create a sophisticated multilevel hierarchical schematicstructure of technical knowledge. In an abstract form, this structure could be

    represented by the graph in Figure 2. Three levels of description are shown for

  • 8/18/2019 Cognitive Load Factors.pdf

    20/116

    Basic Architecture of Human Cognition 11

    functions, processes, and structural components of a technical object. Simple and

    superficial knowledge about the object may include only isolated componentscorresponding to the upper rows in the depicted clusters of knowledge elements.

    Further deepening of knowledge requires establishing relations between these

    components and adding elaborated knowledge on more specific levels of

    description.

    There are many definitions of schemas depending on the theoretical

     perspective of the researcher. It is practically impossible to precisely describe the

    schematic knowledge structures held by an individual. As Norman (1983) noted,

    "we must … discard our hopes of finding neat, elegant mental models, but instead

    learn to understand the messy, sloppy, incomplete, and indistinct structures that

     people actually have" (p. 14). In general, a schema can be described functionally

    as a cognitive construct (an organized knowledge structure) that allows people to

    classify information according to the manner in which it will be used (e.g., Chi,

    Glaser, & Rees, 1982; Sweller, 1993). Such organized knowledge structures

    represent a major mechanism for extracting meaning from new information,

    acquiring and storing knowledge, circumventing the limitations of working

    memory, increasing the strength of memory, and recalling information. They

    impose an organization on the information, guide retrieval, and provideconnections to prior knowledge.

    In schema theory, the process of learning can be considered as encoding new

    information in terms of existing schemas, as schema modification, or as the

    creation of new schemas. The creation or modification of a schema is based on

    conscious cognitive processing of information in working memory. In a more

    general context, schema acquisition could be regarded as an example of a non-

    linear process where the schema emerges from lower-level components during

    learning or practice. As a cognitive unit, the schema represents a higher level of

    organization than just a simple collection of lower-level components.The need for the emergence of higher levels of schema hierarchy could be

    associated with general limitations of human information processing. In a wider

    context, any qualitatively new level of a system emerges in a non-linear way as a

    means to overcome the combinatorial barrier caused by immense number of

     possible combinations of the variety of elements of the previous, lower level.

    Examples of such processes are the emergence of the molecular level from atoms,

     biochemical structures from molecules, or nerve impulses from biochemical

    structures (Scott, 1995; Turchin, 1977). Structured neuronal groups might

    represent the qualitatively new biological level of conscious cognitive functioning(Edelman, 1992). On the psychological level of description, our abstract high-

    level schematic knowledge representations in long-term memory (and

  • 8/18/2019 Cognitive Load Factors.pdf

    21/116

    Slava Kalyuga12

    corresponding intellectual abilities associated with operating such structures)

    might have emerged as a means of overcoming the combinatorial barrier underconditions of limited processing capacity.

    Because a schema is treated as a single unit in working memory, such high-

    level structures require less working memory capacity for processing than the

    multiple, lower-level elements they contain, making the working memory load

    more manageable. Our abilities to construct and use higher-order hierarchical

    cognitive configurations of knowledge structures in long-term memory might

    have emerged during evolution as a way of providing structure to the elements

     being dealt with by working memory (Sweller, 2003, 2004). Thus, by allowing

    multiple elements to be treated as a single element in working memory, long-term

    memory schematic structures may have, as one of their functions, the reduction of

    working memory load.

    Specific schema selection in a particular situation is usually automated and

    quick. Our first impression about an unfamiliar person (which is said to be the

    most important), our comprehension of movies, fiction, music, humor, or art is

    guided by our acquired domain-specific schematic knowledge structures. Schemas

    guide our recall of different past events. Our memory usually retains the gist of a

    situation or event according to our schematic knowledge of it. The schema defineswhat is encoded and stored. When recalling the event, we create schema

    instantiations filling in missing information and inferring unavailable components

    using our schemas for the event. Sometimes such recall may produce various

    distortions to fit our schemas or expectations (e.g., recall scenes of court

     procedures from movies and fiction stories with witnesses remembering details

    they have not actually witnessed).

    The structure of the schematic knowledge can be empirically assessed, for

    example, by asking students to group problems into clusters on the basis of

    similarity; to categorize problems after hearing only part of the text; to provideanswers to problems when content words have been replaced by nonsense words;

    to solve problems when material in the text is ambiguous; to contrast problems

    using a nominated principle; to recall problems that were presented earlier; to

    identify which information within problems is necessary and sufficient for

    solution; and to classify problems in terms of whether the text of each problem

     provides sufficient, missing or irrelevant information for solution (‘text editing’)

    (Low & Over, 1992).

    Previously acquired schematic knowledge structures are the most important

    factor that influences learning new material. A student’s understanding of aninstruction means instantiation of appropriate familiar schemas that would allow

    her or him to assimilate new information with prior knowledge. A failure to

  • 8/18/2019 Cognitive Load Factors.pdf

    22/116

    Basic Architecture of Human Cognition 13

    comprehend instruction might be caused by the lack of any appropriate schemas

    in LTM, by the lack of sufficient cues in the situation to elicit a schema, or by thelearner applying a different schema than that intended by the instruction.

    Students' preexisting schemas often resist change: everything that cannot be

    understood within the available schematic frameworks is ignored or learned by

    rote. It is important to build new knowledge on top of students existing schemas

    or help them to acquire an appropriate schematic framework by relating it to

    something already known. Useful instructional techniques could be analogies or

    diagrams, to establish links with existing knowledge, and advance-organizers to

    elicit or activate existing relevant schemas or provide new ones (concept maps,

    headings, summaries at the start of chapters, etc.).

    Similar to production systems, a schema-based approach to representing

    knowledge provides a general framework that can be instantiated by specific

    theories. In all schema-based models of cognitive architecture, schemas are

    matched to the contents of working memory for recognition. If a schema is

     partially matched by the information in working memory, it will create further

    information to complete the match. Schemas instantiated in working memory

    could be modified or reorganized, then placed back into long-term memory and

    serve as a new, more specific schema for further recognition.Schema theories do not differentiate between procedural and declarative

    knowledge. Instructions for actions may be produced by matching a schema to a

    situation and adding missing pieces of information. For example, recognizing a

    situation as a schema for solving simple linear algebraic equation and recognizing

    values of corresponding slots would provides directions for necessary operations.

    Production rules could be considered as a form of schematic knowledge. There is

    a tendency towards converging production system and schema-based approaches

    within those approaches. For example, Koedinger and Anderson (1990) integrated

    two approaches by constructing a computational (production-system-style) modelof solving geometry problems using schema-based knowledge structures. The

    schemas (‘diagram configuration schemas’) were described as clusters of

    geometry facts that were associated with a single prototypical geometric image.

    In this book, schematic knowledge structures will be used as the basic unit

    and prevailing form of knowledge representations in long-term memory.

    Accordingly, the approach to human performance that is based on studies of

    schematic knowledge structures will be further referred to as a schema approach.

  • 8/18/2019 Cognitive Load Factors.pdf

    23/116

    Slava Kalyuga14

    PROBLEM SOLVING AND THE NATURE 

    OF HUMAN EXPERTISE 

    All of our purposeful cognitive activities can be considered as problem

    solving. Initially, in the 1950s and 1960s, most research studies on problem

    solving were concerned with knowledge-lean task domains that required no

    special training or background knowledge (for example, the famous ‘Tower of

    Hanoi’ task, various puzzles, etc.). The study of such tasks led to the formulation

    of a general theory of human problem solving (Newell & Simon, 1972).  In this

    theory, a problem contains three main components: a given state, a goal state, anda set of operators for transforming the given state into the goal state. Problem-

    solving activity is considered as a search in the problem space that consists of

    separate problem states (knowledge states). The task of problem solving is to find

    a sequence of operators that can transform the initial state into a goal state within

    the problem space. 

    So-called weak methods could be used in solving knowledge-lean tasks. We

    often use general heuristics (rules of thumb) for choosing necessary sequences of

    operators. For example, the difference reduction heuristic suggests choosing

    operators that maximally reduce the difference between the current state and thedesired state. However, this method does not guarantee success in solving the

     problem, and more advanced methods are usually adopted. Forward chaining

    starts with the initial problem state, and a selected heuristics-based operator is

    applied, and then the strategy repeats. Backward chaining starts with the desired

    solution state, and a heuristically chosen operator is applied in reverse. A

    subgoaling strategy chooses an operator and forms a subgoal to find a way to

    change the current state so that the chosen operator could be applied. The method

    of solving by analogy uses the structure of the solution to one problem to obtain

    the solution to another problem (van Lehn, 1989).

    The weak methods are often used in combined forms. For example, the GPS

    (General Problem Solver) production system-based mechanism developed by

     Newell and Simon (1972) uses the means-ends analysis method. This method

    consists of looking for an operation that reduces the difference between the goal

    and initial state, setting up subgoals whose solution provides a solution of the

    original goal, and building up a hierarchical plan to solve a problem. Means-ends

    analysis thus combines forward chaining and operator subgoaling: the current

    state of problem solving is compared to the goal state and actions are selected to

    reduce the difference (van Lehn, 1989).

  • 8/18/2019 Cognitive Load Factors.pdf

    24/116

    Basic Architecture of Human Cognition 15

    In the early 1980s, experiments with puzzle problems demonstrated that, even

    after extensive problem solving by means-ends analysis, participants still did notinduce a simple solution rule. Rule induction occurred only after some additional

    information had been provided (Mawer & Sweller, 1982; Sweller & Levine, 1982;

    Sweller, Mawer, & Howe, 1982). Empirical evidence was obtained that extensive

     practice in conventional problem solving was not an effective way of acquiring

    schemas that are required to successfully solve corresponding problems (Owen &

    Sweller, 1985; Sweller & Cooper, 1985; Sweller & Levine, 1982; Sweller,

    Mawer, & Ward, 1983). These studies suggested that a means-ends strategy could

    inhibit schema acquisition.

    A means-ends strategy focuses attention on specific features of the problem

    situation required to reach the goal and on reducing difference between current

    and goal problem states by selecting proper operators. Maintaining subgoals and

    considering alternative solution pathways are cognitively demanding mental

    activities that might result in working memory overload. Additionally, these

    activities are unrelated to learning solution schemas that are critical for successful

    future problem solving. They reduce resources devoted to learning other

    important aspects of problem structure. For example, studies of two-step problems

    demonstrated that cognitive load might be very high at the subgoal stagesresulting in more errors than on the final goal stage (Ayres & Sweller, 1990).

    Sweller & Levine (1982) demonstrated rapid learning of maze problem-

    solving schemas when the specific goal state was unknown, and it was not

     possible to reduce differences between the goal and given problem states. Sweller,

    Mawer, and Ward (1983) found that using a means-ends strategy can actually

    impair learning, and that less directed exploration of the problems facilitated

    acquisition of useful problem schemas. They used simple physical and geometry

     problems without a specific goal stated (goal-free problems such as Calculate the

    value of as many variables as you can) and observed enhanced development of problem-solving skills. Owen and Sweller (1985) found that problem solvers

    using a means-ends strategy made significantly more errors than those using other

    methods, supposedly due to the working memory load associated with means-

    ends analysis.

    In a theoretical investigation of the cognitive (working memory) load

     phenomena, Sweller (1988) constructed and analyzed a computational model of

    cognitive processes based on a theory of production systems (Newell & Simon,

    1972). The model operates by matching elements on the condition side of each

     production to elements in a working memory (for example, the knowns,unknowns, goal, possible equations or theorems). If the condition side of a

     production is matched by some of the elements in working memory, the

  • 8/18/2019 Cognitive Load Factors.pdf

    25/116

    Slava Kalyuga16

     production can fire, and its action alters the content of working memory allowing

    other productions to fire. The cognitive load in such a model could be measuredconsidering the number of statements in working memory, the number of

     productions, the number of cycles to solution, and the total number of conditions

    matched. Application of this model to novice cognitive behavior in various

    instructional procedures provided evidence of the heavy cognitive load associated

    with a means-ends strategy compared with a forward-working goal-free strategy.

    It also explained why the use of goal-free problems or worked examples was more

    effective means of acquiring schemas than conventional problem solving

    (Sweller, 1988; Ayres & Sweller, 1990).

    Since the late 1970s, the research focus in problem solving shifted to studying

    knowledge-rich task domains (algebra, geometry, physics, thermodynamics,

    computer programming, chess, bridge, etc.) that required an essential knowledge

     base as a prerequisite. Problem solving in such domains has additional

    complexities. Representation of a problem requires a great deal of domain

    knowledge, and operators that are usually used are domain-specific operators. The

    central questions of research in such domains are how is knowledge used to build

    up a problem representation and how does it influence the actual problem-solving

     process (Reimann & Chi, 1989).In semantically rich domains, problem solving involves searching one's

    knowledge of the domain in order to find the operators for solving the problem.

    Research on the use of knowledge in problem solving suggests that people use

    two types of domain-specific knowledge to solve problems: declarative

    conceptual knowledge (knowledge of the principles of the domain) and procedural

    knowledge (knowledge how to perform cognitive activities). Procedural

    knowledge may be described as a set of production rules that define actions for

    achieving goals (Anderson, 1983). Conceptual and procedural knowledge in

     problem solving can be considered as organized into problem schemas. They formthe general framework of knowledge that corresponds to classes of problems.

    Problem solving in complex domains thus can be viewed as finding an

    appropriate problem schema in long-term memory and filling in this schema with

    the specific parameters of the problem (Chi, Feltovich, & Glaser 1981; Chi &

    Glaser, 1985). The problem schema determines what conceptual knowledge is

    used to build a representation of the problem statement, and what procedures are

    used to solve the problem. Much research in knowledge-rich domains is

    concerned with the differences between expert and novice problem solving. It has

     become evident that experts' behavior is mostly determined by their knowledge base. Therefore, the learning processes in which the experts acquired this

    knowledge are critical in explaining their performance. The focus of attention in

  • 8/18/2019 Cognitive Load Factors.pdf

    26/116

    Basic Architecture of Human Cognition 17

    the later studies shifted to learning theories as theories of the acquisition of

    expertise (Van Lehn, 1989).A considerable number of recent research studies in cognitive psychology

    have been concerned with the investigation of the structures and processes of

    human competent performance as a consequence of learning. It is generally

    accepted that development of expert performance is a very complex process

    involving a great deal of deliberate effort. Studies have shown that at least 10

    years of practice are necessary for people in various fields of culture and science

    to reach superior levels of skilled performance (Ericsson & Charness, 1994;

    Ericsson, Krampe, & Tesch-Romer, 1993; Simon & Chase, 1973).

    Expert performance is usually acquired during extensive deliberate practice in

    a domain. Such practice should be organized at an appropriate and challenging

    level of difficulty, allow steady skill refinement by repetition and error correction,

    and provide informative feedback to the learner (Ericsson et al., 1993; Ericsson &

    Lehman, 1996). Competent expert performance generally requires well-developed

    cognitive skills, well-organized structures of knowledge, and self-regulatory

     performance control or metacognitive strategies (Glaser, 1990).

    Well developed cognitive skills as a major characteristic of expert

     performance require functional (related to conditions of applicability) automatedknowledge (Fitts & Posner, 1967; Anderson, 1983, 1993; Klahr, Langley, &

     Neches, 1987). The process of skill learning is claimed to occur in several stages.

    In the first stage (cognitive stage), a description of the procedure is learned in the

    form of declarative knowledge. In the second stage (an associative stage), the

    declarative information is transformed into a procedural form, and a set of

     procedures for performing the skill is acquired. Such a process of converting

    declarative knowledge into a procedural form is called proceduralization. In this

    stage, two forms of knowledge (declarative and procedural) coexist. In the third

    stage (autonomous stage), the skill becomes more rapid and automatic (Anderson,1983).

    When knowledge becomes automated during the development of proficiency,

    conscious processing capacity can be concentrated on higher levels of cognition.

    Automated performance requires a limited attentional capacity. Processing that

    once demanded active control, after extensive practice can become automatic,

    freeing limited attentional capacity for other tasks (Kotovsky, Hayes, & Simon,

    1985; Schneider& Shiffrin, 1977; Shiffrin & Schneider, 1977). For example,

    while the use of declarative knowledge initially requires much conscious

    cognitive processing, automatic application of proceduralized knowledge freesworking memory and allows its capacity to be used for the processing of new

    knowledge. Intensive training on certain procedural elements of a task can make

  • 8/18/2019 Cognitive Load Factors.pdf

    27/116

    Slava Kalyuga18

    them more automatic and free cognitive capacity for other more creative elements.

    This is especially important for transfer of training (Cooper & Sweller, 1987;Howell & Cooke, 1989). Automated lower level routine procedures enable

    learners to concentrate on finding new ways of applying their knowledge in

    unfamiliar situations.

    The process of learning could be considered as the acquisition of new

    schemas that eliminate the need to apply weak problem-solving methods (e.g.,

    means-ends analysis) to solve future similar problems. The result is a shift from a

    novice strategy of working backward from the goal using means-ends analysis

    and subgoaling, to a more expert knowledge-based strategy of working forward

    from the initial state to the goal. Availability of a sufficient set of relevant

    domain-specific schematic knowledge structures that could be used in performing

    tasks is an important feature of a competent human performance. With experience

    in a domain, knowledge is organized into larger interconnected aggregate

    structures that explain the skilled performance of experts (Chi, Glaser, & Farr,

    1988; Lord & Maher, 1991).

    Under a schema-based approach, learning can take different forms. Schema

    evolution is a central mechanism in the development of expertise. New

    information could be encoded in terms of existing schemas without involving anynew schemas. Schemas evolve as they are applied and utilized as learner

    experience in the domain increases. Another form of learning is restructuring or

    creation of new schemas. In order to explain how schemas can be built up through

    experience, Rumelhart and Norman (1981) proposed a mechanism of learning by

    analogy. Initially, a new schema could be created by modeling it on an existing

    schema followed by a process of refinement (tuning). When a learner encounters a

    new situation in a familiar domain, she or he tries to interpret it using existing

    schemas. If none of them suits the situation, the best existing schema can serve as

    a model from which to start the tuning process. The characteristics of this modelthat do not contradict the new situation are carried over into the new schema.

    Planning and self-regulatory (metacognitive) skills allow experts to control

    their performance, assess their work, and predict its results. These self-regulatory

    skills are an important condition of expert ability to use the available knowledge

     base (Chi, Bassok, Lewis, Reimann, & Glaser, 1989; Larkin, McDermott, Simon,

    & Simon, 1980). Chi et al. (1989) proposed that students learn and understand

    examples of problem solutions via the self-explanations they give while studying.

    Students who are successful problem-solvers tend to study example exercises by

    explaining and providing justifications for each action and relating these actionsto the principles and concepts of the domain. These students read the example

    with understanding and self-monitoring. Students who are less successful

  • 8/18/2019 Cognitive Load Factors.pdf

    28/116

    Basic Architecture of Human Cognition 19

     problem-solvers do not connect their explanations (if any) with their

    understanding of the principles of the domain. During problem solving, successfulstudents may use examples for a specific reference, whereas less successful

    students repeat them in search for ready-made solutions. The level of performance

    significantly depends on the metacognitive skills that learners bring to the task.

    Cognitive studies of human performance and learning have the potential to

    greatly influence instructional design principles. Generally, instructional design

    should minimize learners' involvement in activities that overburden their limited

    working memory and be adapted to the learners’ available knowledge structures

    in long-term memory. Appropriate design of instruction should be based on the

    knowledge of characteristics of expert performance, expert-novice differences,

    and the transition process from novice to expert. Cognitive models of expert

     performance and their influence on the design of instruction are considered in the

    following chapter.

  • 8/18/2019 Cognitive Load Factors.pdf

    29/116

  • 8/18/2019 Cognitive Load Factors.pdf

    30/116

     

    Chapter 2

    COGNITIVE STUDIES OF EXPERT-NOVICE

    DIFFERENCES AND DESIGN OF INSTRUCTION 

    SCHEMA-BASED APPROACH TO STUDYING 

    EXPERT PERFORMANCE 

    The purpose of cognitive studies of human expertise is to identify the

    cognitive structures and processes responsible for skilled performance. Expert

     performance has been studied in a variety of domains, for example, chess (de

    Groot, 1965), physics (Chi, Feltovich, & Glaser, 1981; Larkin, McDermott,

    Simon, & Simon, 1980), programming (Anderson, Boyle, & Reiser, 1985) and

    radiology (Lesgold, Rubinson, Feltovich, Glaser, Klopfer, & Wang, 1988), to

    name just a few. Various techniques and approaches have been applied to find out

    the organization of experts' knowledge, the characteristics of their understanding,

    information processing requirements and the nature of competency in such areas

    as chess (Chase & Simon, 1973; Simon, 1979), geometry (Greeno, 1977),

    genetics (Smith & Goodman, 1984), physics (Larkin & Reif, 1976), electronic

    troubleshooting (Brown & Duguid, 1989; Forbus & Gentner, 1986; Gitomer,

    1988; Lesgold & Lajoie, 1991; Morris & Rouse, 1985; Perez, 1991; Rasmussen,

    1986; Swezey, Perez, & Allen, 1988; Tenney & Kurland, 1988; Wiggs & Perez,

    1988), and mechanical troubleshooting (de Kleer & Brown, 1983, 1984; diSesssa,

    1983; Forbus, 1984; Hegarty, 1991; Hegarty & Just, 1989; Heller & Reif, 1984;

    Miyake, 1986; Reif, 1987; Stanfill, 1983; White, 1983; White & Frederiksen,

    1986).As discussed in the previous chapter, schemas are a major type of knowledge

    representation in long-term memory that reflects prototypical features of objects,

  • 8/18/2019 Cognitive Load Factors.pdf

    31/116

    Slava Kalyuga22

    situations, and events. To understand or interpret incoming information, the

    human cognitive system matches this information with existing schemas(Rumelhart & Norman, 1983). In general, studies of expert-novice differences

    demonstrate that expertise is not so much a function of superior problem-solving

    strategies or a better working memory, but rather experts have a better domain-

    specific schematic knowledge base.

    Chunks have played an important role in the development of the

    understanding of expert-novice differences. Since Miller's (1956) finding that

    short-term memory is limited to approximately seven units, or chunks, of

    information, a chunk has served as a unit of measurement for memory capacity. A

    chunk can be considered as a generalized example of a schema. De Groot (1965;

    1966) was one of the first psychologists who investigated expert-novice

    differences and demonstrated that expertise can be explained by the enormous

    amounts of knowledge that experts can access. In his classic studies, chess players

    had to reconstruct the positions of chess pieces on a board, after a brief exposure

    (5 seconds). De Groot's findings that chess masters could recall many more pieces

    from briefly exposed real chess positions than novices was explained by masters

    having larger chunks. Chase and Simon (1973) noticed that experts placed chess

     pieces on the board in groups that represented meaningful configurations. Theexperts did not show superior performance when random placements of the chess

     pieces were used.

    Egan and Schwartz (1979) studied expertise in electronics with a

    methodology similar to that used by Chase and Simon (1973) in studying chess

    expertise. They found that experts could reconstruct large circuit diagrams from

    memory recalling them in chunks of meaningfully related components. The

    experts were better than novices at recalling meaningful (not random) circuit

    diagrams. The size, rather than number, of recalled chunks increased with study

    time. Chase and Ericsson (1982) further suggested that the superior memory ofchess masters and other experts was due to possession of schema structures with

    specific slots filled in with the index information that served as retrieval cues. The

    material could be recalled by reading out the contents of these slots and selecting

    schemas that corresponded to familiar stimuli.

    The schema-based approach was successfully used to explain various

     phenomena related to expert performance and differences between experts and

    novices (Chi et al., 1981; Reimann & Chi, 1989). For example, in the domain of

     physics, experts' categories were based on the principles of mechanics

    (conservation of energy and momentum, etc.), whereas novices' categories were based on objects and surface features stated in each specific problem (incline

     plane, spring, etc.). In the case of an object being balanced on an inclined plane,

  • 8/18/2019 Cognitive Load Factors.pdf

    32/116

    Cognitive Studies of Expert-Novice Differences and Design of Instruction 23

    the experts saw it as an example of a class of problems requiring a balance-of-

    forces approach, while novices saw it as an inclined planes problem type. Thefailure of a novice to solve this problem may result from the fact that different

    incline plane tasks may require different approaches (based on balance of forces,

    energy conservation, etc.), and the presence of the incline plane alone does not

    determine the appropriate approach.

    One of the reasons for novices' difficulties in problem solving is that they

    activate only lower-level schemas that incorporate only surface aspects of the

     problem, whereas experts activate higher-level schemas that contain information

    critical to the problem solution (Chi & Glaser, 1985). Thus, experts categorize

     problems in terms of deep structures such as the laws used to solve the problems,

    while novices categorize problems based on surface structures such as common

     physical attributes. The same problem may elicit different schemas for experts

    than for novices.

    Schematic knowledge structures in long-term memory effectively provide

    necessary executive guidance during high-level cognitive processing (Sweller,

    2003). Without such guidance and in the absence of external instructions, people

    usually resort to random search or weak problem-solving methods such as means-

    ends analysis (a gradual reduction of differences between current and goal problem states). Such methods are cognitively inefficient and time consuming.

    They may impose a heavy working memory load interfering with construction of

    new schemas (Sweller, 1988).

    In contrast, when experts in a domain encounter a familiar problem situation,

    they rapidly retrieve appropriate previously acquired schemas from long-term

    memory and apply them in a cognitively efficient way (Chi, et al., 1981; Larkin,

    et al., 1980). Schemas allow them to categorize different problem states and

    decide the most appropriate solutions. Due to their available knowledge base in

    long-term memory, experts are able to avoid cognitively inefficient mentalactivities and perform with greater accuracy and lower cognitive loads.

    Schematic knowledge structures can be described functionally by indicating

    how a person with a specific level of a schema acquisition would act in relevant

     problem situations. For example, without any schematic knowledge of procedures

    for solving the equation 4x + 2 = 3 and in absence of any guidance, a student will

    treat each symbol separately and may try to use a means-ends analysis approach

     by reducing differences between a current problem state and the goal state (x = ?)

    or attempt to apply various random operations to the numbers.

    With some previously acquired knowledge of an appropriate procedure,another student may immediately proceed to subtract the coefficient 2 from both

    sides of the equation: 4x + 2 – 2 = 3 – 2. The whole combination of elements (e.g.

  • 8/18/2019 Cognitive Load Factors.pdf

    33/116

    Slava Kalyuga24

    4x + 2) will be treated as a meaningful single unit or chunk. If a student practiced

    considerably with this kind of equations, the schema for this procedure may beautomated and her or his first solution step will be 4x = 1. Another, even more

    experienced student may have all the relevant solution procedures well learned or

    automated and would write the final answer (x = 1/4) almost immediately.

    Similar examples of expert-novice differences could be demonstrated in other

    areas. Each symbol in a wiring diagram could be treated as a separate element by

    a novice electrician, while an experienced professional would see the whole

    diagram as representing a complete system. For a foreign language non-speaker, a

     printed text might look as a collection of unfamiliar symbols, while fluent native

    readers would be able to make sense out of the whole text. They would treat

    words or even combinations of words as single elements.

    By combining multiple elements of information into a single chunk in

    working memory, long-term memory schemas allow experts to avoid processing

    overwhelming amounts of information and to effectively reduce working memory

    load during high-level cognitive processing. In addition, experts are also able to

     bypass working memory limitations by having many of their schemas highly

    automated due to extensive practice. Human cognitive architecture has evolved in

    a way that information processing changes significantly as this information becomes more familiar to an individual (Sweller, 2003). Schematic knowledge

    structures held in long-term memory significantly influence the content and

    characteristics of working memory by effectively transforming it into long-term

    working memory (Ericsson & Kintsch, 1995).

    An expert’s routine problem solving in a familiar domain usually involves a

    selection of an appropriate schema, adapting it to the problem, and executing the

    solution procedure. Often it occurs as a direct recognition early in the perception

    of the problem (Chi, Feltovich, & Glaser, 1981). Non-routine problem solving

    includes additional procedures such as search (when more than one schema isapplicable to the situation) or combining the schemas (when no one schema will

    cover the whole problem) (Larkin, 1985). Substantial evidence has accumulated

    that a schema theory of problem solving can be successfully used to explain

    experts' performance in various task domains (Reimann & Chi, 1989).

    Building a problem representation is a key process in problem solving

    (Larkin, 1985; McDermott & Larkin, 1978, Simon & Simon, 1978). It has been

    found that experts spend more time on a qualitative analysis of the problem and

     building explicit representations of the situation (for example, by drawing the

    diagrams of causal relationships between the objects). Experts also form moreabstract and enriched representations than novices do. For example, according to

    Chi, Feltovich, and Glaser (1981), experts classify physics problems based on

  • 8/18/2019 Cognitive Load Factors.pdf

    34/116

    Cognitive Studies of Expert-Novice Differences and Design of Instruction 25

    abstract physics categories and principles, while novices do it according to surface

    characteristics of the problem. Thus, the level of problem representation dependson the solver's problem schemas. An initial cue (first sentences in the problem

    statement, etc.) may activate a particular schema that is then matched to the

     problem. Any mismatch results in the rejection of that schema and triggering of

    another schema.

    Successful problem solving in technical domains depends on the solver's

    schemas for the causal relations between components of a technical system which

    allow mental simulations of the system operation (de Kleer & Brown, 1983;

    Gentner & Stevens, 1983; Miyake, 1986). Providing learners with a causal

    description of a device’s operation in addition to information about its

    components was shown to enhance their ability to operate the device (Kieras &

    Bovair, 1984; Mayer, 1989a).

    Different types of schemas are appropriate for solving different types of

     problems. At higher levels of skill, the choice of schematic knowledge types is

    determined by higher level structures in which an expert's representations are

    organized (Hegarty, 1991). Initially, problem schemas are specific to the

    situations from which they were induced. With experience, they become indexed

     by the general principles and problem solving becomes faster and takes less effort.Organization of the solvers' knowledge into large groups of chunks or schemas

    decreases the demands on working memory and allows learners to activate

    appropriate procedures. As soon as experts retrieve a problem schema, they

    automatically access the procedures for solving the problem (Chi et al., 1981;

    Smith, 1991).

    The development of a problem representation can be viewed as the sequential

    attempts of schema refining, which depends on the structure of the domain-

    specific knowledge of the solver. This results in experts spending more time on

     planning and using forward-working and efficient problem-solving processes(Reimann & Chi, 1989). Empirical studies in various domains have revealed that

     problem-solving strategies are determined by the nature of the problem

    representations, differences in the organization of knowledge, and the number of

    domain-specific problem schemas that solvers have because of their experience in

    a domain (Larkin, 1985; Lesgold, Feltovich, Glaser, & Wang, 1981).

    Experts’ performance is schema-driven. Experts possess more domain-

    specific schemas and can access and use them more efficiently than novices.

    Experts work forward deriving the appropriate problem schema from the problem

    statement. In contrast, novices’ performance is goal-driven. Novices work backward from the goal, searching for operators that will allow them to derive the

    needed solution. However, working backwards is a default strategy that both

  • 8/18/2019 Cognitive Load Factors.pdf

    35/116

    Slava Kalyuga26

    experts and novices use when there is no schema for a given type of problems. In

    a novel situation, experts use various types of general heuristics together withdomain-specific knowledge (Perkins, Schwartz & Simmon, 1991; Rist, 1989;

    Schultz & Luchheud, 1991).

    Thus, expert performance depends on available problem representations,

    knowledge base (facts, concepts, principles, knowledge of a system and rules how

    to use this knowledge), availability of appropriate domain-specific schemas,

    general procedures (strategies, heuristics, algorithms), and relations among all

    these elements (Hart, 1986; Lesgold and Lajoie, 1991). According to Chi, Glaser,

    and Farr (1988), the main features of competent expert performance are:

    1)  domain-specificity (experts exhibit superior performance mainly in their

    own domains);

    2)   perception of problem situations by large meaningful patterns;

    3)  high speed of performance;

    4)  superior well-organized long-term memory knowledge base;

    5)  deep-level and principle-based problem representations;

    6)  thorough qualitative analysis of problems; and

    7) 

    strong self-monitoring skills.

    COGNITIVE STUDIES OF EXPERT-NOVICE DIFFERENCES

    AND INSTRUCTIONAL APPROACHES 

    Most studies of expertise have focused on discrete expert-novice differences

    in solving specific tasks. Existence of a continuum between novices and experts

    has been frequently ignored. As a result, our knowledge about the development of

    expertise and about changes in cognitive processes as expertise is acquired islimited. Groen and Patel (1991) suggested four developmental levels: 1) novices

    with no training in the domain (possessing only common sense knowledge and

    everyday experience); 2) intermediates who have received some instruction in the

    domain; 3) sub-experts who have expertise in a closely related domain (they may

    also be viewed as intermediates); and 4) experts who are always correct in solving

    routine problems and solve them by way of forward reasoning. It is impossible for

    novices to learn expert approaches directly. When expert rules are taught to

     beginners, they form isolated pieces of knowledge that are not retained for a long

     period of time (Groen & Patel, 1991). Thus, an existing theory of expert performance cannot be applied directly to instruction, and theoretical models of

    student transition from one level to another should be developed.

  • 8/18/2019 Cognitive Load Factors.pdf

    36/116

    Cognitive Studies of Expert-Novice Differences and Design of Instruction 27

    Expert routine problem solving is traditionally associated with using a

    forward-working strategy; novices tend to work backward. In the case ofunfamiliar problems experts also use backward reasoning. The studies of Sweller

    and his colleagues (Mawer & Sweller, 1982; Sweller & Levine, 1982; Sweller et

    al., 1983) brought some understanding of when the switch occurs during the

    development of expertise and what factors would facilitate the switch. It was

    demonstrated that means-ends analysis might prevent the acquisition of problem-

    specific rules because this method could leave no cognitive resources available for

    meaningful learning.

    Rule acquisition occurred or improved under conditions where subjects were

     provided with information additional to the problem goal (for example, a set of

    subgoals) or were given goal-free problems. Sweller et al., (1983) hypothesized

    that the main factor responsible for this result was the kind of information a

    learner focuses on during problem solving. If knowledge or schema acquisition is

    an aim of problem solving, then the influence of the goal as a control mechanism

    should be reduced.

    In some studies, forward reasoning intermediate level medical students

     performed more poorly then either experts or novices (Groen & Patel, 1991). This

    result was explained by their dogmatic reliance on existing basic scienceknowledge. When students' knowledge contains misconceptions, forward

    reasoning might be harmful for learning. If they reasoned backward, then the

    misconceptions would be just temporary hypotheses. It was suggested that in such

    cases an emphasis should be placed on self-explanations and testing their

    adequacy (explanation-based learning) rather than on correct problem solving

    (Groen & Patel, 1991).

    Most of the experimental evidence in the area of expert-novice differences

    was obtained by contrasting performance of experts and novices. Schoenfeld and

    Hermann (1982) conducted one of the first longitudinal studies of the relationship between problem perception and expertise. Students' perceptions of mathematical

     problems were examined before and after intensive training in mathematical

     problem solving. It was demonstrated that novices sorted problems based on

    surface components mentioned in the problem statement. After the training, they

    sorted them in a more expert-like way according to the principles of problem

    solution. Thus, problem perception and problem schemas on which such

     perception is based changed as learners became more experienced in the domain.

    With the development of expertise, problem schemas change in their level of

    specificity (diSessa, 1983; Forbus & Gentner, 1986; Kaiser, Jonides, &Alexander, 1986). Initially induced from specific situations, they become more

    general and indexed by the underlying principles (Chi et al., 1981). At higher

  • 8/18/2019 Cognitive Load Factors.pdf

    37/116

    Slava Kalyuga28

    levels of development, schemas may also change from qualitative to quantitative

    representing relationships between components of problem situations more precisely (Forbus & Gentner, 1986; Hegarty, Just, & Morrison, 1988). As people

    gain more experience with technical systems, they learn relations between their

    common subsystems and learn to chunk components of systems into these

    subsystems (Hegarty, 1991). New information is then assimilated into existing

    sophisticated knowledge structures.

    The learning mechanisms and strategies evolve as a learner becomes more

    experienced (Langley & Simon, 1981). Lesgold et al. (1988) hypothesized that

    early learning is perceptual and different from later cognitive learning. Experts

    use schemas to interpret incoming information, intermediates often reshape their

     perceptions to fit the schema, whereas novices completely rely on their

     perceptions. The previously mentioned decline in performance at intermediate

    levels can also be due to the shift from perceptual learning to cognitive schema-

     based learning.

    According to the triarchic/global/local architecture of expert cognition

    (Sternberg & Frensch, 1992), when processing information from new domains, an

    expert relies mostly on controlled, global processing. If information belongs to the

    expert's narrow area of expertise, she or he relies mostly on automatic, local processing. Such local processing systems can operate in parallel, be automated,

    and characterized by almost unlimited processing capacity. As expertise develops,

    learned portions of processing procedures are transferred to a local processing

    system. This enables experts to automate more processing and thus to free global

     processing resources for dealing with new situations (Sternberg & Frensch, 1992).

    However, experts may be inflexible in new situations because it is difficult to

    reorganize an automated schema. Experiments with bridge players confirmed that

    experts were more affected when new task demands required changing deep,

    abstract principles rather than surface features. Novices were more affected bysurface changes than by deep, abstract changes (Sternberg & Frensch, 1992).

     Nevertheless, Schraagen (1993) demonstrated that when domain-specific

    knowledge is missing, experts could still maintain a more structured approach

    than novices could by making use of more abstract high-level knowledge.

    According to the theory of skill acquisition (Anderson, 1983), the instruction

    in specific performance procedures must be preceded by the instruction in the

    concepts, rules, and principles of how things work (declarative knowledge). In

    addition to the theoretical principles, the ability to apply them in concrete

    situations should be developed (Morris & Rouse, 1985). A procedural approachonly is not sufficient, because it is impossible to predict all possible situations in

    advance, especially in complex domains like modern digital electronics. Thus,

  • 8/18/2019 Cognitive Load Factors.pdf

    38/116

    Cognitive Studies of Expert-Novice Differences and Design of Instruction 29

    training should combine knowledge of system principles with procedures of how

    to use this knowledge in a specific context. In general, teaching expert performance might require a basic conceptual explanation of how things work,

     practice in carrying out basic procedures, and variation in experiences for tuning

    of procedural knowledge and the development of persistence and confidence

    (Gentner & Stevens, 1983; Greeno & Simon, 1988).

    Kieras and Bovair (1984) demonstrated that providing students with

    conceptual models of a complex system prior to information on how to use that

    system produced better recall, faster learning, and fewer errors in the operation of

    the system. Combined structural and functional descriptions of system operations

    are recommended for effective learning (Psotka, Massey, & Mutter, 1988).

    However, specific instructional strategies should be based on the cognitive

    requirements of particular tasks. The user does not always need a complete

    knowledge of the system in order to be able to operate it.

    For example, many experts in technical areas have a very limited

    understanding of general physics principles but satisfactorily perform their duties.

    If a device is simple, or a procedure is easily learned and practiced (e.g., a

    telephone) there may be no need to provide a device model. The user may infer a

    usable model without instruction (Kieras & Bovair, 1984). Limited underlyingknowledge and understanding of how certain functions are fulfilled are required

    for operating and troubleshooting systems with simple functions. For more

    complex systems, a deeper understanding of their components and operation is

    required (Lesgold & Lajoie, 1991).

     Novices often have difficulties integrating general theoretical concepts with

    their intuitions because of conflicts between everyday meanings of new concepts

    (e.g., acceleration, mass) and their meaning in theory (Reif, 1987), conflicts

     between students' intuitive knowledge and theoretical laws (diSessa, 1982), or

     because of the lack of procedural knowledge of solving specific problems that isoften not explicitly taught (Heller & Reif, 1984).

    There have been two major approaches in using the results of cognitive

    research on knowledge structures in the design of instructional systems (Glaser,

    1990). The first approach has been developed in the tradition of knowledge

    engineering in artificial intelligence and design of expert systems. It requires

    exposing the learner to the knowledge characteristics of well- developed

    expertise. The well-known example of a computer-based instructional system

    designed in accordance with this approach is the GUIDON project (Clancey &

    Letsinger, 1984).The second approach has been developed in cognitive science and is based on

    cognitive models of students' knowledge. For example, in instructional systems

  • 8/18/2019 Cognitive Load Factors.pdf

    39/116

    Slava Kalyuga30

     based on qualitative models (Chi, 1988; Forbus & Gentner, 1986), a learner has to

     progress from simple to more sophisticated domain-specific conceptual models(e.g., coordinated functional, causal, and structural models; qualitative and

    quantitative models). This progression occurs in the context of solving

    specifically designed problems with gradually increasing levels of complexity. An

    example of this approach is the program for teaching troubleshooting of electric

    circuits QUEST (White & Frederiksen, 1986).

    Similar ideas were realized in the STEAMER project (the simulator for

    training engineers to operate steam propulsion plants aboard large naval ships).

    The primary goal was to teach a robust conceptual model (rather than specific

     procedures) that could be used to reason about the steam plant qualitatively

    (Holland, Hutchins, McCandless, Rosenstein, & Weitzman, 1987). Abstract

    graphic images of the steam plant were organized in a hierarchical manner with

    the major plant parameters presented first, followed by more detailed simulations

    of subsystem components.

    SHERLOCK is an example of a coached-practice learning environment in

    which learners compare their own performance with expert performance (Gabrys,

    Weiner, & Lesgold, 1993; Lesgold and Lajoie, 1991). Such reflection, however,

    may place a large demand on working memory, if solution paths are long orcomplicated. SHERLOCK supports reflection by a replay of the trainee's and an

    expert's performance. During replay, the system provides a summary of the

    information the user has obtained on previous steps. The system allows learners to

    observe the expert's decision process, reasons behind it, and the overall goal

    structure for the expert performance. This technique reduces the cognitive load

    associated with remembering the details of trainee's own performance while

    observing the expert's actions (Gabrys et al., 1993).

    Another well-known example of a similar approach is the model-tracing

    methodology in intelligent tutoring systems (Anderson, 1993). The tutoringsystem simulates a student’s cognitive behavior in real time and maintains a

    model of the student's knowledge state. It provides an example-based learning

    environment in which students can induce rules from examples. The learner's

    actual performance is compared to the ideal structure of solution (production rules

    model), and the student is kept on the correct solution path. The tutor estimates

    the availability of acquired productions based on their correct and incorrect

    applications and selects appropriate problems for exercises. Many tutoring

     programs based on the model-tracing methodology have been effectively used in

    the fields of programming, geometry proofs, solving algebraic equations(Anderson, Boyle, & Reiser, 1985; Anderson & Corbett, 1993; Anderson,

  • 8/18/2019 Cognitive Load Factors.pdf

    40/116

    Cognitive Studies of Expert-Novice Differences and Design of Instruction 31

    Corbett, Fincham, Hoffman, & Pelletier, 1992; Anderson, Farrell, & Sauers,

    1984).

    COGNITIVE MODELS OF DEVELOPMENT OF EXPERTISE

    AND INSTRUCTIONAL DESIGN 

    Cognitive studies of human performance and learning have demonstrated that

    learning processes are supported by a basic cognitive architecture that includes a

     powerful long-term memory and a limited working memory. Schema acquisition

    and automation as the major learning mechanisms are critical in intellectual skills

    formation. Studies of chess skills and other domains indicate that our knowledge

     base provides the foundation of intellectual skills. Schemas held in long-term

    memory allow experts to avoid processing overwhelming amounts of information

    in working memory and thus by-pass working memory limitations.

    Automatic processin