Generative Grading CS398 - web.stanford.edu - GenerativeGrading.pdfGenerative Grading CS398. Four...
Transcript of Generative Grading CS398 - web.stanford.edu - GenerativeGrading.pdfGenerative Grading CS398. Four...
Generative GradingCS398
Four Prototypical Trajectories
Checkout this assignment on code.org
The Code.org Dataset
● Students learning nested loops
● 50k students with 1.5 million submissions to a curriculum of 8 exercises.
● 800 human labels across 2 of the exercises.
P4
https://studio.code.org/s/course4/stage/10/puzzle/4
Four Prototypical Trajectories
Can you write an algorithm to give auto-feedback for this assignment?
1E-6
1E-5
1E-4
1E-3
1E-2
1E-1
1E+0
1E+0 1E+1 1E+2 1E+3 1E+4 1E+5
Prob
abili
ty M
ass
(log
scal
e)
Rank Order (log scale)
Edit distance is meaningless, Code is Zifian
100
10-1
10-2
10-3
10-4
10-5
10-6
100 101 102 103 104 105
f(k) =1/ks
PNn (1/ns)
Code Zipf Plot
Exponential combination of decisions. Super fat tailed. Everything looks unique
1E-6
1E-5
1E-4
1E-3
1E-2
1E-1
1E+0
1E+0 1E+1 1E+2 1E+3 1E+4 1E+5
Prob
abili
ty M
ass
(log
scal
e)
Rank Order (log scale)
Hard Problem
1 million unique solutions to programming Linear Regression
Brute force solution?
WWW 2014
100
10-1
10-2
10-3
10-4
10-5
10-6
100 101 102 103 104 105
f(k) =1/ks
PNn (1/ns)
Code Zipf Plot
Hard Problem
1 million unique solutions to programming Linear Regression
Brute force solution?
WWW 2014
1 100 100001 100 10000
1 100 10000
Code.org A Stanford A
Coursera B
1 100 10000
Stanford C
Code Zipf Plot
Evaluation Task
Stanford TAs label 800 submissions
import code.org.*;
public class MySoln {public void run() {move(50);for(int i=0; i<4; i++){if(frontIsClear()) {turnLeft(90);
}for(int j=0; j<i; i++){move(i * 20);turnRight(120);move(10);
}}
}}
Evaluation Task
0.00.10.20.30.40.50.60.70.80.91.0
Feed
back
F1
Scor
e
First Problem (P1)Last Problem (P8)
Traditional Deep Learning Doesn’t Work
Old Gaurd
Humans
Label student code
0.00.10.20.30.40.50.60.70.80.91.0
Feed
back
F1
Scor
e
First Problem (P1)Last Problem (P8)
Old Gaurd
HumansDeep Learning
Inaccurate, Uninterpretable, and Data Hungry
Label student coderun
cond body
putBeeper
putBeeper move
while
Raw
Pre
cond
ition
Raw
Pos
tcon
ditio
n
Multiply Program Matrix… …Encoder Decoder
Prediction LossAutoencoding Loss
Decoder
Piech et Al, ICML 2014
0.00.10.20.30.40.50.60.70.80.91.0
Feed
back
F1
Scor
e
First Problem (P1)Last Problem (P8)
Old Gaurd
HumansDeep Learning
Inaccurate, Uninterpretable, and Data Hungry
Label student code
0.00.10.20.30.40.50.60.70.80.91.0
Feed
back
F1
Scor
e
First Problem (P1)Last Problem (P8)
Old Gaurd
HumansDeep Learning
Data Hungry
Label student code
We need one shot learning
We needverifiability
Four Prototypical Trajectories
We need a strategy! How are we going to solve this problem???
Four Prototypical Trajectories
[Suspense]
Four Prototypical Trajectories
Taste of the future of AI
Humans Don’t Need Much Data
Single training example:
Test set:
Bayesian Program Learning
Lake et al. Human-level concept learning through probabilistic program induction
Bayesian Program Learning
Imagine Students
• Struggle with double for loops
• Confuses logic for deleting bricks
• Strugglewithdoubleforloops
• Confuseslogicfordeletingbricks
Imagine Students
A students“ability”
Infer ability and choices from code
A students“choices”
The resulting code
This is easy and exponential This is hard and linear
0.00.10.20.30.40.50.60.70.80.91.0
Feed
back
F1
Scor
e
First Problem (P1)Last Problem (P8)
Old Gaurd
HumansDeep Learning
Label student code
Generative Understanding
• Strugglewithdoubleforloops
• Confuseslogicfordeletingbricks
0.00.10.20.30.40.50.60.70.80.91.0
Feed
back
F1
Scor
e
First Problem (P1)Last Problem (P8)
Generative Understanding
Old Gaurd
Humans
Label student code
Deep Learning
Zero Shot
Learning
• Strugglewithdoubleforloops
• Confuseslogicfordeletingbricks
Four Prototypical Trajectories
In simple terms: generate a ton of our own labelled examples. It’s an unreasonably
good way to start to hit high performance
Four Prototypical Trajectories
How many decisions with 3 options to get to10K labelled programs?
Four Prototypical Trajectories
How do I write said grammars??
ideaToText
Teachers Articulate N misconceptionsThis is code for a single decision point
Give a name to the choice that the student is making
How do those choices translate into grades?
What does the code look like? Often evokes other decision points
1.
2.
3.
4.
Starter Code in Blocks
Solution in Blocks
Starter Code in PsuedoCode
Solution in PsuedoCode
For(???, ???, ???) {}
For(15, 300, 15) {Repeat(4) {
Move(Counter)TurnLeft(90)
}}
Four Prototypical Trajectories
First challenge: no solution or yes solution.
Second challenge: draw square using repeat or unrolled
Four Prototypical Trajectories
Pro Tips
Keep your grammar clean!
One idea per class
Have a knowledge prior
Decisions are not independent
Be clever in your use of outcomes
Template Choices