The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

50
The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1

Transcript of The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

Page 1: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

The interaction plateau

CPI 494, April 9, 2009

Kurt VanLehn

1

Page 2: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

2

Schematic of a natural language tutoring systems, AutoTutor

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: IncorrectT: Hint or prompt

Remediation:

Only if out of hints

Page 3: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

3

Schematic of other natural language tutors, e.g., Atlas, Circsim-Tutor, Kermit-SE

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: IncorrectT: What is…?S: I don’t know.T:Well, what is…

S:…T:…

Remediation:

Only if out of hints

Often called a KCD: Knowledge construction dialogue

Page 4: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

4

Hypothesized ranking of tutoring, most effective first

A. Expert human tutors

B. Ordinary human tutors

C. Natural language tutoring systems

D. Step-based tutoring systems

E. Answer-based tutoring systems

F. No tutoring

Page 5: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

5

Hypothesized effect sizes

00.5

11.5

22.5

No tutoring

Answer-based tut...

Step-based tutoring

Nat. lang. tutoring

Ordinary human t...

Expert human tutors

Lear

ning

gai

ns

Page 6: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

6

Hypothesized effect sizes

00.5

11.5

22.5

No tutoring

Answer-based tut...

Step-based tutoring

Nat. lang. tutoring

Ordinary human t...

Expert human tutors

Lear

ning

gai

nsBloom’s (1984) 2-sigma: 4 weeks of human tutoring vs. classroom

Classroom

Page 7: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

7

Hypothesized effect sizes

00.5

11.5

22.5

No tutoring

Answer-based tut...

Step-based tutoring

Nat. lang. tutoring

Ordinary human t...

Expert human tutors

Lear

ning

gai

ns

Classroom

Kulik (1984) meta-analysis of CAI vs. classroom 0.4 sigma

Page 8: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

8

Hypothesized effect sizes

00.5

11.5

22.5

No tutoring

Answer-based tut...

Step-based tutoring

Nat. lang. tutoring

Ordinary human t...

Expert human tutors

Lear

ning

gai

ns

Classroom

Many intelligent tutoring systems: e.g., Andes (VanLehn et al, 2005), Carnegie Learning’s tutors…

Page 9: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

9

My main claim: There is an interaction plateau

00.5

11.5

22.5

Lear

ning

gai

ns

Expected Observed

Page 10: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

10

A problem and its steps Suppose you are running in a straight line at

constant speed. You throw a pumpkin straight up. Where will it land?

1. Initially, you and the pumpkin have the same horizontal velocity.

2. Your throw exerts a net force vertically on the pumpkin.3. Thus causing a vertical acceleration.4. Which leaves the horizontal velocity unaffected.5. So when the pumpkin falls, it has traveled the same

distance horizontally as you have.6. Thus, it lands in your hands

Page 11: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

11

A dialogue between a human tutor (T) and human student (S)

Suppose you are running in a straight line at constant speed. You throw a pumpkin straight up. Where will it land?

S: Behind me.– T: Hmm. Let’s think about that. Before you toss the

pumpkin and are just carrying it, do you and the pumpkin have the same speed?S: Yes

– T: Good. When you toss it up, is the net force on it exactly vertical?S: I’m not sure.T: You exert a force on the pumpkin, right?Etc.

Page 12: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

12

Schematic of dialogue about a single step

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: IncorrectT: Hint, or prompt,

or explain, or analogy, or …

Remediation:

Page 13: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

13

Comparisons of expert to novice human tutors

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: IncorrectT: Hint, or prompt,

or explain, or analogy, or …

Novices

Experts

Experts may have a wider variety

Page 14: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

14

Schematic of an ITS handling of a single step

Stepstart

T: Tell

S: Correct

Stepend

S: IncorrectT: Hint

Only if out of hints

Page 15: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

15

Major differences Low-interaction tutoring (e.g., CAI)

– Remediation on answer only Step-based interaction (e.g., ITS)

– Remediation on each step– Hint sequence, with final “bottom out” hint

Natural tutoring (e.g., human tutoring) – Remediation on each step, substep, inference…– Natural language dialogues– Many tutorial tactics

Page 16: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

16

Conditions(VanLehn, Graesser et al., 2007) Natural tutoring

– Expert Human tutors » Typed» Spoken

– Natural language dialogue computer tutors» Why2-AutoTutor (Graesser et al.)» Why2-Atlas (VanLehn et al.)

Step-based interaction– Canned text remediation

Low interaction– Textbook

Page 17: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

17

Human tutors(a form of natural tutoring)

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: IncorrectT: Hint, or prompt,

or explain, or analogy, or …

Page 18: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

18

Why2-Atlas(a form of natural tutoring)

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: IncorrectA Knowledge Construction

Dialogue

Page 19: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

19

Why2-AutoTutor(a form of natural tutoring)

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: IncorrectHint or prompt

Page 20: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

20

Canned-text remediation(a form of step-based interaction)

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: IncorrectText

Page 21: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

21

Experiment 1: Intermediate students & instruction

00.10.20.30.40.50.60.70.80.9

1

MultipleChoice

Essay

Adj

uste

d po

st-t

est

scor

e Human tutors(N=18)

Why2-Atlas(N=22)

Why2-AutoTutor(N=24)

Canned textremediation(N=22)

Page 22: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

22

Experiment 1: Intermediate students & instruction

00.10.20.30.40.50.60.70.80.9

1

MultipleChoice

Essay

Adj

uste

d po

st-t

est

scor

e Human tutors(N=18)

Why2-Atlas(N=22)

Why2-AutoTutor(N=24)

Canned textremediation(N=22)

No reliable differences

Page 23: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

23

Experiment 2:AutoTutor > Textbook = Nothing

00.10.20.30.40.50.60.70.80.9

1

Multiple Choice Essay

Ad

just

ed p

ost

-tes

t sc

ore

AutoTutor

Textbook

Nothing

Reliably different

Page 24: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

24

Experiments 1 & 2(VanLehn, Graesser et al., 2007)

00.10.20.30.40.50.60.70.80.9

1

Read-onlytextbookstudying

Step-based

computertutoring

Why2-AutoTutor

Why2-Atlas

Humantutoring

Ad

jus

ted

po

st-

tes

t s

co

res No significant differences

Page 25: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

25

Experiment 3: Intermediate students & instruction

00.10.20.30.40.50.60.70.80.9

1

Multiplechoice

Near transferessay

Far transferessay

Retentionmultiplechoice

Retentionessay

Why2-AutoTutor (N=32) Canned Text Remediation (N=30)

Deeper assessments

Page 26: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

26

Experiment 3: Intermediate students & instruction

00.10.20.30.40.50.60.70.80.9

1

Multiplechoice

Near transferessay

Far transferessay

Retentionmultiplechoice

Retentionessay

Why2-AutoTutor (N=32) Canned Text Remediation (N=30)

No reliable differences

Page 27: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

27

Experiment 4: Novice students & intermediate instruction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Essay

Spoken humantutoring (N=14)

Typed humantutoring (N=20)

Canned textremediation(N=20)

Relearning

Page 28: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

28

Experiment 4: Novice students & intermediate instruction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Essay

Spoken humantutoring (N=14)

Typed humantutoring (N=20)

Canned textremediation(N=20)

All differences reliable

Page 29: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

29

Experiment 5: Novice students & intermediate (but shorter) instruction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Near essay Far essay

Spokenhumantutoring(N=21)Why2-Atlas(N=21)

Why2-AutoTutor(N=21)

Canned textremediation(N=19)

Relearning AddAdd

Page 30: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

30

Experiment 5: Novice students & intermediate instruction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Near essay Far essay

Spokenhumantutoring(N=21)Why2-Atlas(N=21)

Why2-AutoTutor(N=21)

Canned textremediation(N=19)

No reliable differences

Page 31: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

31

Experiment 5: Low-pretest students only

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Near essay Far essay

Spokenhumantutoring(N=9)Why2-Atlas(N=7)

Why2-AutoTutor(N=10)

Canned textremediation(N=11)

Aptitude-treatment

interaction?

Page 32: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

32

Experiment 5, Low-pretest students only

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Near essay Far essay

Spokenhumantutoring(N=9)Why2-Atlas(N=7)

Why2-AutoTutor(N=10)

Canned textremediation(N=11)

Spoken human tutoring > canned text remediation

Page 33: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

33

Experiments 6 and 7 Novice students & novice instruction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Fill in the blank Essay

Why2-AutoTutor

CTR expt 6

Text only

Why2-Atlas

CTR expt 7

Was the intermediate text over the novice

students’ heads?

Page 34: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

34

Experiments 6 and 7 Novice students & novice instruction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Fill in the blank Essay

Why2-AutoTutor

CTR expt 6

Text only

Why2-Atlas

CTR expt 7No reliable differences

Page 35: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

35

Interpretation

Experiments 1 & 4

Experiments 3 & 5

Experiments 6 & 7

High-pretest Low-pretest

Intermediates

High-pretest Low-pretest

Novices

Content complexity

= Can follow reasoning only with tutor’s help (ZPD) predict: Tutoring > Canned text remediation= Can follow reasoning without any help predict: Tutoring = Canned text remediation

Page 36: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

36

Original research questions

Can natural language tutorial dialog add pedagogical value?– Yes, when students must study content that is too

complex to be understood by reading alone

How feasible is a deep linguistic tutoring system?– We built it. It’s fast enough to use.

Can deep linguistic and dialog techniques add pedagogical value?

Page 37: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

37

When content is too complex to learn by reading alone: Deep>Shallow?

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Multiple choice Near essay Far essay

Spokenhumantutoring(N=9)Why2-Atlas(N=7)

Why2-AutoTutor(N=10)

Canned textremediation(N=11)

Why2-Atlas is not clearly better than Why2-AutoTutor

Page 38: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

38

When to use deep vs. shallow?

Shallow linguistic Deep linguistic

Sentence understanding

LSA, Rainbow, Rappel Carmel: parser, semantics…

Essay/Discourse understanding

LSA Abduction, Bnets

Dialog management

Finite state networks Reactive planning

Natural language generation

Text Plan-based

Use both

Use deep

Use locally smart FSA

Use equivalent texts

Page 39: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

39

Results from all 7 experiments(VanLehn, Graesser et al., 2007)

Why2: Atlas = AutoTutor Why2 > Textbook

– No essays– Content differences

Human tutoring = Why2 = Canned text remediation– Except when novice students worked with instruction

designed for intermediates, then Human tutoring > Canned text remediation

Page 40: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

40

Other evidence for the interaction plateau (Evens & Michael, 2006)

0

1

2

3

4

5

6

Reading(1993)

Reading(1999)

Reading(2002)

Circsim(1999)

Circsim-Tutor

(1999)

Circsim-Tutor

(2002)

Humantutors(1999)

Humantutors(1993)

Mea

n ga

in

No significant differences

Page 41: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

41

Other evidence for the interaction plateau (Reif & Scott, 1999)

0

10

20

30

40

50

60

70

80

90

100

Untutored Step-basedtutoring

Human tutoring

No significant differences

Page 42: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

42

Other evidence for the interaction plateau (Chi, Roy & Hausmann, in press)

0

10

20

30

40

50

60

70

Individuals +video

Individuals +textbook

Pairs + textbook Pairs + video Human tutoring

Ad

just

ed

de

ep

po

st-t

est

ste

ps

%

No significant differences

Page 43: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

43

Still more studies where natural tutoring = step-based interaction Human tutors

1. Human tutoring = human tutoring with only content-free prompting for step remediation (Chi et al., 2001)

2. Human tutoring = canned text during post-practice remediation (Katz et al., 2003)

3. Socratic human tutoring = didactic human tutoring (Rosé et al., 2001a

4. Socratic human tutoring = didactic human tutoring (Johnson & Johnson, 1992)

5. Expert human tutoring = novice human tutoring (Chae, Kim & Glass, 2005)

Natural language tutoring systems1. Andes-Atlas = Andes with canned text (Rosé et al, 2001b)2. Kermit = Kermit with dialogue explanations (Weerasinghe &

Mitrovic, 2006)

Page 44: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

44

Hypothesis 1: Exactly how tutors remedy a step doesn’t matter much

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: Incorrect

What’s in here doesn’t matter much

Page 45: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

45

Main claim: There is an interaction plateau

00.5

11.5

22.5

Low-interactioninstruction

Step-basedinstruction

Naturaltutoring

Lear

ning

gai

ns

Expected Observed

Hypothesis 1

Page 46: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

46

Hypothesis 2: Cannot eliminate the step remediation loop

Stepstart

T: Tell

T: Elicit S: Correct

Stepend

S: Incorrect

Must avoid this

Page 47: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

47

Main claim: There is an interaction plateau

00.5

11.5

22.5

Low-interactioninstruction

Step-basedinstruction

Naturaltutoring

Lear

ning

gai

ns

Expected Observed

Hypothesis 2

Page 48: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

48

Conclusions

What does it take to make computer tutors as effective as human tutors?– Step-based interaction– Bloom’s 2-sigma results may have been due to weak

control conditions (classroom instruction)– Other evaluations have also used weak controls

When is natural language useful?– For steps themselves (vs. menus, algebra…)– NOT for feedback & hints (remeditation) on steps

Page 49: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

49

Future directions for tutoring systems research

Making step-based instruction ubiquitous– Authoring & customizing– Novel task domains

Increasing engagement

Page 50: The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1.

50

Final thought

Many people “just know” that more interaction produces more learning.

“It ain’t so much the things we don’t know that get us into trouble. It’s the things we know that just ain’t so.” – Josh Billings (aka. Henry Wheeler Shaw)