The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

The interaction plateau

CPI 494, April 9, 2009

Kurt VanLehn

1

2

Schematic of a natural language tutoring systems, AutoTutor

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: IncorrectT: Hint or prompt

Remediation:

Only if out of hints

3

Schematic of other natural language tutors, e.g., Atlas, Circsim-Tutor, Kermit-SE

Stepstart

T: Tell


Stepend

S: IncorrectT: What is…?S: I don’t know.T:Well, what is…

S:…T:…

Remediation:


Often called a KCD: Knowledge construction dialogue

4

Hypothesized ranking of tutoring, most effective first

A. Expert human tutors

B. Ordinary human tutors

C. Natural language tutoring systems

D. Step-based tutoring systems

E. Answer-based tutoring systems

F. No tutoring

5

Hypothesized effect sizes

00.5

11.5

22.5

No tutoring

Answer-based tut...

Step-based tutoring

Nat. lang. tutoring

Ordinary human t...

Expert human tutors

Lear

ning

gai

ns

6


00.5

11.5

22.5

No tutoring

Answer-based tut...

Step-based tutoring

Nat. lang. tutoring

Ordinary human t...

Expert human tutors

Lear

ning

gai

nsBloom’s (1984) 2-sigma: 4 weeks of human tutoring vs. classroom

Classroom

7


00.5

11.5

22.5

No tutoring

Answer-based tut...

Step-based tutoring

Nat. lang. tutoring

Ordinary human t...

Expert human tutors

Lear

ning

gai

ns

Classroom

Kulik (1984) meta-analysis of CAI vs. classroom 0.4 sigma

8


00.5

11.5

22.5

No tutoring

Answer-based tut...

Step-based tutoring

Nat. lang. tutoring

Ordinary human t...

Expert human tutors

Lear

ning

gai

ns

Classroom

Many intelligent tutoring systems: e.g., Andes (VanLehn et al, 2005), Carnegie Learning’s tutors…

9

My main claim: There is an interaction plateau

00.5

11.5

22.5

Lear

ning

gai

ns

Expected Observed

10

A problem and its steps Suppose you are running in a straight line at

constant speed. You throw a pumpkin straight up. Where will it land?

1. Initially, you and the pumpkin have the same horizontal velocity.

2. Your throw exerts a net force vertically on the pumpkin.3. Thus causing a vertical acceleration.4. Which leaves the horizontal velocity unaffected.5. So when the pumpkin falls, it has traveled the same

distance horizontally as you have.6. Thus, it lands in your hands

11

A dialogue between a human tutor (T) and human student (S)

Suppose you are running in a straight line at constant speed. You throw a pumpkin straight up. Where will it land?

S: Behind me.– T: Hmm. Let’s think about that. Before you toss the

pumpkin and are just carrying it, do you and the pumpkin have the same speed?S: Yes

– T: Good. When you toss it up, is the net force on it exactly vertical?S: I’m not sure.T: You exert a force on the pumpkin, right?Etc.

12

Schematic of dialogue about a single step

Stepstart

T: Tell


Stepend

S: IncorrectT: Hint, or prompt,

or explain, or analogy, or …

Remediation:

13

Comparisons of expert to novice human tutors

Stepstart

T: Tell


Stepend



Novices

Experts

Experts may have a wider variety

14

Schematic of an ITS handling of a single step

Stepstart

T: Tell

S: Correct

Stepend

S: IncorrectT: Hint


15

Major differences Low-interaction tutoring (e.g., CAI)

– Remediation on answer only Step-based interaction (e.g., ITS)

– Remediation on each step– Hint sequence, with final “bottom out” hint

Natural tutoring (e.g., human tutoring) – Remediation on each step, substep, inference…– Natural language dialogues– Many tutorial tactics

16

Conditions(VanLehn, Graesser et al., 2007) Natural tutoring

– Expert Human tutors » Typed» Spoken

– Natural language dialogue computer tutors» Why2-AutoTutor (Graesser et al.)» Why2-Atlas (VanLehn et al.)

Step-based interaction– Canned text remediation

Low interaction– Textbook

17

Human tutors(a form of natural tutoring)

Stepstart

T: Tell


Stepend



18

Why2-Atlas(a form of natural tutoring)

Stepstart

T: Tell


Stepend

S: IncorrectA Knowledge Construction

Dialogue

19

Why2-AutoTutor(a form of natural tutoring)

Stepstart

T: Tell


Stepend

S: IncorrectHint or prompt

20

Canned-text remediation(a form of step-based interaction)

Stepstart

T: Tell


Stepend

S: IncorrectText

21

Experiment 1: Intermediate students & instruction

00.10.20.30.40.50.60.70.80.9

1

MultipleChoice

Essay

Adj

uste

d po

st-t

est

scor

e Human tutors(N=18)

Why2-Atlas(N=22)

Why2-AutoTutor(N=24)

Canned textremediation(N=22)

22


00.10.20.30.40.50.60.70.80.9

1

MultipleChoice

Essay

Adj

uste

d po

st-t

est

scor

e Human tutors(N=18)

Why2-Atlas(N=22)



No reliable differences

23

Experiment 2:AutoTutor > Textbook = Nothing

00.10.20.30.40.50.60.70.80.9

1

Multiple Choice Essay

Ad

just

ed p

ost

-tes

t sc

ore

AutoTutor

Textbook

Nothing

Reliably different

24

Experiments 1 & 2(VanLehn, Graesser et al., 2007)

00.10.20.30.40.50.60.70.80.9

1

Read-onlytextbookstudying

Step-based

computertutoring

Why2-AutoTutor

Why2-Atlas

Humantutoring

Ad

jus

ted

po

st-

tes

t s

co

res No significant differences

25


00.10.20.30.40.50.60.70.80.9

1

Multiplechoice

Near transferessay

Far transferessay

Retentionmultiplechoice

Retentionessay

Why2-AutoTutor (N=32) Canned Text Remediation (N=30)

Deeper assessments

26


00.10.20.30.40.50.60.70.80.9

1

Multiplechoice

Near transferessay

Far transferessay

Retentionmultiplechoice

Retentionessay

Why2-AutoTutor (N=32) Canned Text Remediation (N=30)


27

Experiment 4: Novice students & intermediate instruction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Essay

Spoken humantutoring (N=14)

Typed humantutoring (N=20)


Relearning

28


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Essay

Spoken humantutoring (N=14)

Typed humantutoring (N=20)


All differences reliable

29

Experiment 5: Novice students & intermediate (but shorter) instruction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Near essay Far essay

Spokenhumantutoring(N=21)Why2-Atlas(N=21)



Relearning AddAdd

30


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1






31

Experiment 5: Low-pretest students only

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1





Aptitude-treatment

interaction?

32

Experiment 5, Low-pretest students only

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1





Spoken human tutoring > canned text remediation

33

Experiments 6 and 7 Novice students & novice instruction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Fill in the blank Essay

Why2-AutoTutor

CTR expt 6

Text only

Why2-Atlas

CTR expt 7

Was the intermediate text over the novice

students’ heads?

34

Experiments 6 and 7 Novice students & novice instruction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Fill in the blank Essay

Why2-AutoTutor

CTR expt 6

Text only

Why2-Atlas

CTR expt 7No reliable differences

35

Interpretation

Experiments 1 & 4

Experiments 3 & 5

Experiments 6 & 7

High-pretest Low-pretest

Intermediates

High-pretest Low-pretest

Novices

Content complexity

= Can follow reasoning only with tutor’s help (ZPD) predict: Tutoring > Canned text remediation= Can follow reasoning without any help predict: Tutoring = Canned text remediation

36

Original research questions

Can natural language tutorial dialog add pedagogical value?– Yes, when students must study content that is too

complex to be understood by reading alone

How feasible is a deep linguistic tutoring system?– We built it. It’s fast enough to use.

Can deep linguistic and dialog techniques add pedagogical value?

37

When content is too complex to learn by reading alone: Deep>Shallow?

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1





Why2-Atlas is not clearly better than Why2-AutoTutor

38

When to use deep vs. shallow?

Shallow linguistic Deep linguistic

Sentence understanding

LSA, Rainbow, Rappel Carmel: parser, semantics…

Essay/Discourse understanding

LSA Abduction, Bnets

Dialog management

Finite state networks Reactive planning

Natural language generation

Text Plan-based

Use both

Use deep

Use locally smart FSA

Use equivalent texts

39

Results from all 7 experiments(VanLehn, Graesser et al., 2007)

Why2: Atlas = AutoTutor Why2 > Textbook

– No essays– Content differences

Human tutoring = Why2 = Canned text remediation– Except when novice students worked with instruction

designed for intermediates, then Human tutoring > Canned text remediation

40

Other evidence for the interaction plateau (Evens & Michael, 2006)

0

1

2

3

4

5

6

Reading(1993)

Reading(1999)

Reading(2002)

Circsim(1999)

Circsim-Tutor

(1999)

Circsim-Tutor

(2002)

Humantutors(1999)

Humantutors(1993)

Mea

n ga

in

No significant differences

41

Other evidence for the interaction plateau (Reif & Scott, 1999)

0

10

20

30

40

50

60

70

80

90

100

Untutored Step-basedtutoring

Human tutoring


42

Other evidence for the interaction plateau (Chi, Roy & Hausmann, in press)

0

10

20

30

40

50

60

70

Individuals +video

Individuals +textbook

Pairs + textbook Pairs + video Human tutoring

Ad

just

ed

de

ep

po

st-t

est

ste

ps

%


43

Still more studies where natural tutoring = step-based interaction Human tutors

1. Human tutoring = human tutoring with only content-free prompting for step remediation (Chi et al., 2001)

2. Human tutoring = canned text during post-practice remediation (Katz et al., 2003)

3. Socratic human tutoring = didactic human tutoring (Rosé et al., 2001a

4. Socratic human tutoring = didactic human tutoring (Johnson & Johnson, 1992)

5. Expert human tutoring = novice human tutoring (Chae, Kim & Glass, 2005)

Natural language tutoring systems1. Andes-Atlas = Andes with canned text (Rosé et al, 2001b)2. Kermit = Kermit with dialogue explanations (Weerasinghe &

Mitrovic, 2006)

44

Hypothesis 1: Exactly how tutors remedy a step doesn’t matter much

Stepstart

T: Tell


Stepend

S: Incorrect

What’s in here doesn’t matter much

45

Main claim: There is an interaction plateau

00.5

11.5

22.5

Low-interactioninstruction

Step-basedinstruction

Naturaltutoring

Lear

ning

gai

ns

Expected Observed

Hypothesis 1

46

Hypothesis 2: Cannot eliminate the step remediation loop

Stepstart

T: Tell


Stepend

S: Incorrect

Must avoid this

47

Main claim: There is an interaction plateau

00.5

11.5

22.5

Low-interactioninstruction

Step-basedinstruction

Naturaltutoring

Lear

ning

gai

ns

Expected Observed

Hypothesis 2

48

Conclusions

What does it take to make computer tutors as effective as human tutors?– Step-based interaction– Bloom’s 2-sigma results may have been due to weak

control conditions (classroom instruction)– Other evaluations have also used weak controls

When is natural language useful?– For steps themselves (vs. menus, algebra…)– NOT for feedback & hints (remeditation) on steps

49

Future directions for tutoring systems research

Making step-based instruction ubiquitous– Authoring & customizing– Novel task domains

Increasing engagement

50

Final thought

Many people “just know” that more interaction produces more learning.

“It ain’t so much the things we don’t know that get us into trouble. It’s the things we know that just ain’t so.” – Josh Billings (aka. Henry Wheeler Shaw)

The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

Documents

Transcript of The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.