Student simulation and evaluation DOD meeting Hua Ai ([email protected]) 03/03/2006.
-
date post
19-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of Student simulation and evaluation DOD meeting Hua Ai ([email protected]) 03/03/2006.
Student simulation Student simulation and evaluation and evaluation DOD meetingDOD meeting
Hua Ai ([email protected])Hua Ai ([email protected])
03/03/200603/03/2006
22
OutlineOutline
MotivationsMotivations BackgroundsBackgrounds CorpusCorpus Student Simulation ModelStudent Simulation Model ComparisonsComparisons Conclusions & Future WorkConclusions & Future Work
33
MotivationsMotivations
For larger corpusFor larger corpus Reinforcement Learning (RL) is used to Reinforcement Learning (RL) is used to
learn the best policy for spoken dialogue learn the best policy for spoken dialogue systems automaticallysystems automatically
Best strategy may often not even be present Best strategy may often not even be present in small datasetin small dataset
For cheaper corpusFor cheaper corpus Human subjects are expensiveHuman subjects are expensive
44
Simulated User
Dialog Manager
Strategy
Reinforcement Learning
DialogCorpus
Simulation models
Strategy learning using a simulated user (Schatzmann et al., Strategy learning using a simulated user (Schatzmann et al., 2005)2005)
55
Backgrounds (1)Backgrounds (1)
Education communityEducation community Focusing on changes of student’s inner-Focusing on changes of student’s inner-
brain knowledge representation formsbrain knowledge representation forms Usually not dialogue basedUsually not dialogue based Simulated students for (Venlehn et al., 1994) Simulated students for (Venlehn et al., 1994)
tutor trainingtutor training Collaborative learningCollaborative learning
66
Backgrounds (2)Backgrounds (2)
Dialogue communityDialogue community Focusing on interactions and dialogue Focusing on interactions and dialogue
behaviorsbehaviors Simulated users have limited actions to takeSimulated users have limited actions to take (Schatzmann et al., 2005)(Schatzmann et al., 2005)
Simulating on DA levelSimulating on DA level
77
Corpus (1)Corpus (1)
Spoken dialogue physics tutor (ITSPOKE)Spoken dialogue physics tutor (ITSPOKE)
88
Corpus (2)Corpus (2)
Tutoring procedureTutoring procedure
(T) Question
(S) Answer
Dialogue(T) Q(S) A
…
Essay revision
Dialogue
(T) Question
(S) Answer
Dialogue(T) Q(S) A
…
Essay revision
Dialogue
… …
5 problems
99
Corpus (3)Corpus (3)
Tutor’s behaviorsTutor’s behaviors Defined in KCD (Knowledge Construction Defined in KCD (Knowledge Construction
Dialogues)Dialogues)
Correct
Incorrect/Partially Correct
1010
Corpus (4)Corpus (4)
#dialogues stuWord stuTurn tutorWord tutorTurn
f03 100 avg 57.16 23.35 1256.92 29.64
(Synthesized) stdev 45.57638 17.44334 849.8195 19.76351
05syn 136 avg 91.0963 30.78519 1655.467 38.06667
(Synthesized) stdev 53.82931 14.42551 757.8744 16.32469
05pre 135 avg 87.34559 30.11765 1597.206 37.33088
(pre-
recorded) stdev 55.48004 16.96972 832.9845 18.20096
f03:s05 Different groups of subjectsf03:s05 Different groups of subjects
1111
Simulation Models (1)Simulation Models (1)
Simulating on word levelSimulating on word level Student’s have more complex behaviorsStudent’s have more complex behaviors DA info alone isn’t enough for the systemDA info alone isn’t enough for the system
Two models trained on two corpusTwo models trained on two corpus
ProbCorrect
Random
f03
s05
03ProbCorrect
03Random
05ProbCorrect
05Random
1212
Simulation Models (2)Simulation Models (2)
ProbCorrect ModelProbCorrect Model Simulates average knowledge level of real Simulates average knowledge level of real
studentsstudents Simulate meaningful dialogue behaviorsSimulate meaningful dialogue behaviors
Random ModelRandom Model Non-senseNon-sense As a contrastAs a contrast
1313ProbCorrect ModelProbCorrect Model
Real corpusquestion1Answer1_1 (c)Answer1_2 (ic)Answer1_3 (ic)
question2Answer2_1 (c)Answer2_2 (ic)
Candidate Ans:For question1c:ic = 1:2c:Answer1_1ic:Answer1_2Answer1_3
For question2c:ic = 1:1c:Answer2_1icAnswer2_2
ProbCorrect Model:Question 1Answer: 1) Choose to give a
c/ic answer with the same average probability as real student
2) Randomly choose one answers from the corresponding answer set
1414
HC03&05Question1Answer1_1Answer1_2Answer1_3Answer1_4
Question2Answer2_1Answer2_2
Candidate Ans:1) Answer1_12) Answer1_23) Answer1_34) Answer1_45) Answer2_16) Answer2_2
Big random Model:Question i:
Answer: any of the 6 answers with the same probability
(Regardless the question!)
Random ModelRandom Model
1515
ExperimentsExperiments
Comparisons between real corporaComparisons between real corpora Comparisons between real & simulated Comparisons between real & simulated
corporacorpora Comparisons between simulated corporaComparisons between simulated corpora
1616
Evaluation metricsEvaluation metrics High-level dialog features High-level dialog features Dialog style and cooperativeness Dialog style and cooperativeness Dialog Success Rate and Efficiency Dialog Success Rate and Efficiency Learning GainsLearning Gains
Real Corpora Real Corpora Comparisons (1)Comparisons (1)
1717
High-level dialog featuresHigh-level dialog features
Real corpora comparisons Real corpora comparisons (2)(2)
1818
Real corpora comparisons Real corpora comparisons (3)(3)
Dialogue style featuresDialogue style features
1919
Real corpora comparisons Real corpora comparisons (3)(3)
Dialogue success rateDialogue success rate
2020
Real corpora comparisons Real corpora comparisons (4)(4)
Learning gains featuresLearning gains features
2121
ResultsResults
Differences captured by these simple Differences captured by these simple metrics can’t help to conclude whether a metrics can’t help to conclude whether a corpus is real or not (Schatzmann et al., corpus is real or not (Schatzmann et al., 2005)2005)
Differences could be due to different user Differences could be due to different user population population
2222
Real Vs Simulated Real Vs Simulated Corpora Comparisons Corpora Comparisons
00.20.40.60.8
11.21.41.61.8
2
tuto
rTur
n
tuto
rWord
tWor
dRate
stuTurn
stuW
ord
sWor
dRate
corre
ctRat
e
f03 03smooth 03random s05 05smooth
2323
Results (1) Results (1)
Most of the measurements are able to Most of the measurements are able to distinguish between Random and distinguish between Random and ProbCorrect modelProbCorrect model
ProbCorrect model generates more ProbCorrect model generates more realistic behaviorsrealistic behaviors
We can’t conclude on the power of these We can’t conclude on the power of these metrics since the two simulated corpus metrics since the two simulated corpus are really differentare really different
2424
Results (2)Results (2)
Differences between real and random Differences between real and random models are captured clearly, but models are captured clearly, but differences between real and differences between real and ProbCorrect is not clearProbCorrect is not clear
We don’t expect this simple model to give We don’t expect this simple model to give very real corpus. It’s surprising that the very real corpus. It’s surprising that the differences are small differences are small
2525
Results (3)Results (3)
S05 variety > f03 variety S05 variety > f03 variety 05probCorrect variety > 03probCorrect 05probCorrect variety > 03probCorrect varietyvariety
However, we don’t get significantly more However, we don’t get significantly more varieties in the simulated corpus than the varieties in the simulated corpus than the real onesreal ones Could be the computer tutor is simple (c/ic)Could be the computer tutor is simple (c/ic) We’re using the same candidate answer setWe’re using the same candidate answer set
2626
Results (4)Results (4)
ProbCorrect models trained on different ProbCorrect models trained on different real corpora are quite differentreal corpora are quite different
The ProbCorrect model is more similar to The ProbCorrect model is more similar to the real corpus it is trained from than to the real corpus it is trained from than to the other real corpusthe other real corpus
2727
Comparisons between Comparisons between simulated dialogues with simulated dialogues with different dialogue structuredifferent dialogue structure
f03problem34
0
0.2
0.4
0.6
0.8
1
1.2
1.4
03prob 03smoothed 03random
f03problem7
00.20.40.60.8
11.21.41.6
tuto
rTur
n
tuto
rWor
d
tWor
dRat
e
stuTur
n
stuW
ord
sWor
dRat
e
corre
ctRat
e
03prob 03smoothed 03random
2828
ResultsResults
Larger differences between the two Larger differences between the two simulated corpora in prob7 than in simulated corpora in prob7 than in prob34prob34
Dialogue structure of prob34 is more Dialogue structure of prob34 is more restrictedrestricted
The power of these simple metrics is The power of these simple metrics is restricted by the dialogue structurerestricted by the dialogue structure
2929
ConclusionsConclusions
The simple measurements can The simple measurements can distinguish between distinguish between real corporareal corpora
Different populationDifferent population simulated and real corpora simulated and real corpora
To different extentTo different extent simulated corporasimulated corpora
Different modelsDifferent models Trained on different corporaTrained on different corpora Limited to different Dialog structureLimited to different Dialog structure
3030
Future workFuture work
Explore “deep” evaluation metricsExplore “deep” evaluation metrics Test simulated corpus on policyTest simulated corpus on policy More simulation modelsMore simulation models
More human featuresMore human features Emotion, learningEmotion, learning
Special casesSpecial cases Quick learners, slow learnersQuick learners, slow learners