The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

Post on 18-Dec-2015

219 views 3 download

Tags:

Transcript of The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

The interaction plateau

CPI 494, April 9, 2009

Kurt VanLehn

1

2

Schematic of a natural language tutoring systems, AutoTutor

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: IncorrectT: Hint or prompt

Remediation:

Only if out of hints

3

Schematic of other natural language tutors, e.g., Atlas, Circsim-Tutor, Kermit-SE

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: IncorrectT: What is…?S: I don’t know.T:Well, what is…

S:…T:…

Remediation:

Only if out of hints

Often called a KCD: Knowledge construction dialogue

4

Hypothesized ranking of tutoring, most effective first

A. Expert human tutors

B. Ordinary human tutors

C. Natural language tutoring systems

D. Step-based tutoring systems

E. Answer-based tutoring systems

F. No tutoring

5

Hypothesized effect sizes

00.5

11.5

22.5

No tutoring

Answer-based tut...

Step-based tutoring

Nat. lang. tutoring

Ordinary human t...

Expert human tutors

Lear

ning

gai

ns

6

Hypothesized effect sizes

00.5

11.5

22.5

No tutoring

Answer-based tut...

Step-based tutoring

Nat. lang. tutoring

Ordinary human t...

Expert human tutors

Lear

ning

gai

nsBloom’s (1984) 2-sigma: 4 weeks of human tutoring vs. classroom

Classroom

7

Hypothesized effect sizes

00.5

11.5

22.5

No tutoring

Answer-based tut...

Step-based tutoring

Nat. lang. tutoring

Ordinary human t...

Expert human tutors

Lear

ning

gai

ns

Classroom

Kulik (1984) meta-analysis of CAI vs. classroom 0.4 sigma

8

Hypothesized effect sizes

00.5

11.5

22.5

No tutoring

Answer-based tut...

Step-based tutoring

Nat. lang. tutoring

Ordinary human t...

Expert human tutors

Lear

ning

gai

ns

Classroom

Many intelligent tutoring systems: e.g., Andes (VanLehn et al, 2005), Carnegie Learning’s tutors…

9

My main claim: There is an interaction plateau

00.5

11.5

22.5

Lear

ning

gai

ns

Expected Observed

10

A problem and its steps Suppose you are running in a straight line at

constant speed. You throw a pumpkin straight up. Where will it land?

1. Initially, you and the pumpkin have the same horizontal velocity.

2. Your throw exerts a net force vertically on the pumpkin.3. Thus causing a vertical acceleration.4. Which leaves the horizontal velocity unaffected.5. So when the pumpkin falls, it has traveled the same

distance horizontally as you have.6. Thus, it lands in your hands

11

A dialogue between a human tutor (T) and human student (S)

Suppose you are running in a straight line at constant speed. You throw a pumpkin straight up. Where will it land?

S: Behind me.– T: Hmm. Let’s think about that. Before you toss the

pumpkin and are just carrying it, do you and the pumpkin have the same speed?S: Yes

– T: Good. When you toss it up, is the net force on it exactly vertical?S: I’m not sure.T: You exert a force on the pumpkin, right?Etc.

12

Schematic of dialogue about a single step

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: IncorrectT: Hint, or prompt,

or explain, or analogy, or …

Remediation:

13

Comparisons of expert to novice human tutors

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: IncorrectT: Hint, or prompt,

or explain, or analogy, or …

Novices

Experts

Experts may have a wider variety

14

Schematic of an ITS handling of a single step

Stepstart

T: Tell

S: Correct

Stepend

S: IncorrectT: Hint

Only if out of hints

15

Major differences Low-interaction tutoring (e.g., CAI)

– Remediation on answer only Step-based interaction (e.g., ITS)

– Remediation on each step– Hint sequence, with final “bottom out” hint

Natural tutoring (e.g., human tutoring) – Remediation on each step, substep, inference…– Natural language dialogues– Many tutorial tactics

16

Conditions(VanLehn, Graesser et al., 2007) Natural tutoring

– Expert Human tutors » Typed» Spoken

– Natural language dialogue computer tutors» Why2-AutoTutor (Graesser et al.)» Why2-Atlas (VanLehn et al.)

Step-based interaction– Canned text remediation

Low interaction– Textbook

17

Human tutors(a form of natural tutoring)

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: IncorrectT: Hint, or prompt,

or explain, or analogy, or …

18

Why2-Atlas(a form of natural tutoring)

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: IncorrectA Knowledge Construction

Dialogue

19

Why2-AutoTutor(a form of natural tutoring)

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: IncorrectHint or prompt

20

Canned-text remediation(a form of step-based interaction)

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: IncorrectText

21

Experiment 1: Intermediate students & instruction

00.10.20.30.40.50.60.70.80.9

1

MultipleChoice

Essay

Adj

uste

d po

st-t

est

scor

e Human tutors(N=18)

Why2-Atlas(N=22)

Why2-AutoTutor(N=24)

Canned textremediation(N=22)

22

Experiment 1: Intermediate students & instruction

00.10.20.30.40.50.60.70.80.9

1

MultipleChoice

Essay

Adj

uste

d po

st-t

est

scor

e Human tutors(N=18)

Why2-Atlas(N=22)

Why2-AutoTutor(N=24)

Canned textremediation(N=22)

No reliable differences

23

Experiment 2:AutoTutor > Textbook = Nothing

00.10.20.30.40.50.60.70.80.9

1

Multiple Choice Essay

Ad

just

ed p

ost

-tes

t sc

ore

AutoTutor

Textbook

Nothing

Reliably different

24

Experiments 1 & 2(VanLehn, Graesser et al., 2007)

00.10.20.30.40.50.60.70.80.9

1

Read-onlytextbookstudying

Step-based

computertutoring

Why2-AutoTutor

Why2-Atlas

Humantutoring

Ad

jus

ted

po

st-

tes

t s

co

res No significant differences

25

Experiment 3: Intermediate students & instruction

00.10.20.30.40.50.60.70.80.9

1

Multiplechoice

Near transferessay

Far transferessay

Retentionmultiplechoice

Retentionessay

Why2-AutoTutor (N=32) Canned Text Remediation (N=30)

Deeper assessments

26

Experiment 3: Intermediate students & instruction

00.10.20.30.40.50.60.70.80.9

1

Multiplechoice

Near transferessay

Far transferessay

Retentionmultiplechoice

Retentionessay

Why2-AutoTutor (N=32) Canned Text Remediation (N=30)

No reliable differences

27

Experiment 4: Novice students & intermediate instruction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Essay

Spoken humantutoring (N=14)

Typed humantutoring (N=20)

Canned textremediation(N=20)

Relearning

28

Experiment 4: Novice students & intermediate instruction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Essay

Spoken humantutoring (N=14)

Typed humantutoring (N=20)

Canned textremediation(N=20)

All differences reliable

29

Experiment 5: Novice students & intermediate (but shorter) instruction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Near essay Far essay

Spokenhumantutoring(N=21)Why2-Atlas(N=21)

Why2-AutoTutor(N=21)

Canned textremediation(N=19)

Relearning AddAdd

30

Experiment 5: Novice students & intermediate instruction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Near essay Far essay

Spokenhumantutoring(N=21)Why2-Atlas(N=21)

Why2-AutoTutor(N=21)

Canned textremediation(N=19)

No reliable differences

31

Experiment 5: Low-pretest students only

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Near essay Far essay

Spokenhumantutoring(N=9)Why2-Atlas(N=7)

Why2-AutoTutor(N=10)

Canned textremediation(N=11)

Aptitude-treatment

interaction?

32

Experiment 5, Low-pretest students only

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Near essay Far essay

Spokenhumantutoring(N=9)Why2-Atlas(N=7)

Why2-AutoTutor(N=10)

Canned textremediation(N=11)

Spoken human tutoring > canned text remediation

33

Experiments 6 and 7 Novice students & novice instruction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Fill in the blank Essay

Why2-AutoTutor

CTR expt 6

Text only

Why2-Atlas

CTR expt 7

Was the intermediate text over the novice

students’ heads?

34

Experiments 6 and 7 Novice students & novice instruction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Fill in the blank Essay

Why2-AutoTutor

CTR expt 6

Text only

Why2-Atlas

CTR expt 7No reliable differences

35

Interpretation

Experiments 1 & 4

Experiments 3 & 5

Experiments 6 & 7

High-pretest Low-pretest

Intermediates

High-pretest Low-pretest

Novices

Content complexity

= Can follow reasoning only with tutor’s help (ZPD) predict: Tutoring > Canned text remediation= Can follow reasoning without any help predict: Tutoring = Canned text remediation

36

Original research questions

Can natural language tutorial dialog add pedagogical value?– Yes, when students must study content that is too

complex to be understood by reading alone

How feasible is a deep linguistic tutoring system?– We built it. It’s fast enough to use.

Can deep linguistic and dialog techniques add pedagogical value?

37

When content is too complex to learn by reading alone: Deep>Shallow?

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Near essay Far essay

Spokenhumantutoring(N=9)Why2-Atlas(N=7)

Why2-AutoTutor(N=10)

Canned textremediation(N=11)

Why2-Atlas is not clearly better than Why2-AutoTutor

38

When to use deep vs. shallow?

Shallow linguistic Deep linguistic

Sentence understanding

LSA, Rainbow, Rappel Carmel: parser, semantics…

Essay/Discourse understanding

LSA Abduction, Bnets

Dialog management

Finite state networks Reactive planning

Natural language generation

Text Plan-based

Use both

Use deep

Use locally smart FSA

Use equivalent texts

39

Results from all 7 experiments(VanLehn, Graesser et al., 2007)

Why2: Atlas = AutoTutor Why2 > Textbook

– No essays– Content differences

Human tutoring = Why2 = Canned text remediation– Except when novice students worked with instruction

designed for intermediates, then Human tutoring > Canned text remediation

40

Other evidence for the interaction plateau (Evens & Michael, 2006)

0

1

2

3

4

5

6

Reading(1993)

Reading(1999)

Reading(2002)

Circsim(1999)

Circsim-Tutor

(1999)

Circsim-Tutor

(2002)

Humantutors(1999)

Humantutors(1993)

Mea

n ga

in

No significant differences

41

Other evidence for the interaction plateau (Reif & Scott, 1999)

0

10

20

30

40

50

60

70

80

90

100

Untutored Step-basedtutoring

Human tutoring

No significant differences

42

Other evidence for the interaction plateau (Chi, Roy & Hausmann, in press)

0

10

20

30

40

50

60

70

Individuals +video

Individuals +textbook

Pairs + textbook Pairs + video Human tutoring

Ad

just

ed

de

ep

po

st-t

est

ste

ps

%

No significant differences

43

Still more studies where natural tutoring = step-based interaction Human tutors

1. Human tutoring = human tutoring with only content-free prompting for step remediation (Chi et al., 2001)

2. Human tutoring = canned text during post-practice remediation (Katz et al., 2003)

3. Socratic human tutoring = didactic human tutoring (Rosé et al., 2001a

4. Socratic human tutoring = didactic human tutoring (Johnson & Johnson, 1992)

5. Expert human tutoring = novice human tutoring (Chae, Kim & Glass, 2005)

Natural language tutoring systems1. Andes-Atlas = Andes with canned text (Rosé et al, 2001b)2. Kermit = Kermit with dialogue explanations (Weerasinghe &

Mitrovic, 2006)

44

Hypothesis 1: Exactly how tutors remedy a step doesn’t matter much

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: Incorrect

What’s in here doesn’t matter much

45

Main claim: There is an interaction plateau

00.5

11.5

22.5

Low-interactioninstruction

Step-basedinstruction

Naturaltutoring

Lear

ning

gai

ns

Expected Observed

Hypothesis 1

46

Hypothesis 2: Cannot eliminate the step remediation loop

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: Incorrect

Must avoid this

47

Main claim: There is an interaction plateau

00.5

11.5

22.5

Low-interactioninstruction

Step-basedinstruction

Naturaltutoring

Lear

ning

gai

ns

Expected Observed

Hypothesis 2

48

Conclusions

What does it take to make computer tutors as effective as human tutors?– Step-based interaction– Bloom’s 2-sigma results may have been due to weak

control conditions (classroom instruction)– Other evaluations have also used weak controls

When is natural language useful?– For steps themselves (vs. menus, algebra…)– NOT for feedback & hints (remeditation) on steps

49

Future directions for tutoring systems research

Making step-based instruction ubiquitous– Authoring & customizing– Novel task domains

Increasing engagement

50

Final thought

Many people “just know” that more interaction produces more learning.

“It ain’t so much the things we don’t know that get us into trouble. It’s the things we know that just ain’t so.” – Josh Billings (aka. Henry Wheeler Shaw)