Data driven student feedback for Programming Intensive MOOCs · Coursera’s ML course [Moocshop,...

Jonathan Huang Stanford University

Data driven student feedback for Programming Intensive MOOCs

Towards global scale CS education

Leonidas Guibas Chris Piech Andy Nguyen

Steve Jobs Stanford, 2005

“all of my working-class parents' savings were being spent on my college tuition… …the minute I dropped out [of college] I could stop taking the required classes that didn't interest me, and begin dropping in on the ones that looked interesting.”

Course selection is better online

MOOC = Massive Open Online Courses

Untapped potential

CS does not count towards high school math/science

requirements in 36 of 50 states

*** source: www.code.org, www.collegeboard.org

2011 2016 2021

400 students 100,000 students

Stanford ML-class

10 TAs 2,500 TAs (???)

Stanford ML-class

Ease of global scale feedback on a spectrum

Short Response

Long Response

Multiple choice

Essay questions

Proofs Today: Programming Assignments

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta % theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha m = length(y); % number of training examples J_history = zeros(num_iters, 1); for iter = 1:num_iters theta = theta-alpha*1/m*(X'*(X*theta-y)); J_history(iter) = computeCost(X, y, theta); end

Feedback for Coding Assignments: Easy?

Test Inputs

Correct / Incorrect ? Test Outputs

Linear Regression submission (Homework 1) for Coursera’s ML class

The “but it works!!” solution function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) m = length(y); J_history = zeros(num_iters, 1); for iter = 1:num_iters hypo = X*theta; newMat = hypo – y; trans1 = (X(:,1)); trans1 = trans1’; newMat1 = trans1 * newMat; temp1 = sum(newMat1); temp1 = (temp1 *alpha)/m; A = [temp1]; theta(1) = theta(1) - A; trans2 = (X(:,2))’ ; newMat2 = trans2*newMat; temp2 = sum(newMat2); temp2 = (temp2 *alpha)/m; B = [temp2]; theta(2)= theta(2) - B; J_history(iter) = computeCost(X, y, theta); end theta(1) = theta(1); theta(2)= theta(2); Why??

Correctness Efficiency

Style Elegance

Better: theta = theta-(alpha/m) *X'*(X*theta-y)

Good Good Poor Poor

for a class of 100,000, and in real time,

Let’s do this:

New programming problems

New courses

But… can’t require too much instructor effort to make this work for:

New programming languages

We now have massive datasets…

Visualization of 40,000 implementations of linear regression submitted to Coursera’s ML course

[Moocshop, 2013]

Intro CS

1K 10K

Efficient index for “code phrases” of a

MOOC dataset

Shared structure discovery amongst

many student submissions

Applications such as bug finding (w/o execution) and

MOOC-scale feedback

Results on real MOOC with > 1 million

submissions

Codewebs Engine

First, an example application function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta % theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha m = length(y); % number of training examples J_history = zeros(num_iters, 1); for iter = 1:num_iters theta = theta-alpha*1/m*(X'*(X*theta-y)); J_history(iter) = computeCost(X, y, theta); end

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta % theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha m = length(y); % number of training examples J_history = zeros(num_iters, 1); for iter = 1:num_iters theta = theta-alpha*1/m*sum(X'*(X*theta-y)); J_history(iter) = computeCost(X, y, theta); end

Correct

Incorrect

First, an example application function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta % theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha m = length(y); % number of training examples J_history = zeros(num_iters, 1); for iter = 1:num_iters theta = theta-alpha*1/m*(X'*(X*theta-y)); J_history(iter) = computeCost(X, y, theta); end

Correct

Incorrect

Dear Lisa Simpson, consider the dimension of the expression:

X'*(X*theta-y) and what happens after you call sum on it…

Syntax based approach:

Attach this message to everyone containing that exact expression

(covers 99 submissions)

The extraneous sum bug takes many forms…

(Easier) Output based approach:

Attach message to everyone who matched extraneous sum bug in unit test output

(covers 1091 submissions)

theta = theta-alpha*1/m*sum(X'*(X*theta-y));

theta = theta-alpha*1/m*sum(((theta’*X’)’-y)’*X);

theta = theta-alpha*1/m*sum(transpose(X*theta-y)*X);

Codewebs approach to feedback

Combined 1604

Step 1: Find equivalent ways of writing buggy expression using Codewebs engine Step 2: Write a thoughtful/meaningful hint or explanation Step 3: Propagate feedback message to any submission containing equivalent expression

Output based 1091

Codewebs 1208

# submissions covered by single message

Codewebs approach to feedback

Combined 1604

Step 1: Find equivalent ways of writing buggy expression using Codewebs engine Step 2: Write a thoughtful/meaningful hint or explanation Step 3: Propagate feedback message to any submission containing equivalent expression

Output based 1091

Codewebs 1208

# submissions covered by single message

~47% improvement over just using an output based

feedback system!!

Abstract syntax tree representations

• Whitespace • Comments • …

ASTs ignore:

function A = warmUpExercise() A = []; A = eye(5); endfunction

ASSIGN

IDENT (A) INDEX_EXP

IDENT (eye) ARGUMENT_LIST

CONST (5)

Indexing documents by phrases

term/phrase document list best {1,3} blue {2,4,6} bright {7,8,10,11,12} heat {1,5,13} kernel {2,5,6,9,56} sky {1,2} submarine {2,3,4} woes {10,19,38} yellow {2,4}

The bright and blue butterfly hangs on the breeze…

We all something something yellow submarine…

“blue sky” “yellow submarine”

Indexing documents by phrases

term/phrase document list best {1,3} blue {2,4,6} bright {7,8,10,11,12} heat {1,5,13} kernel {2,5,6,9,56} sky {1,2} submarine {2,3,4} woes {10,19,38} yellow {2,4}

The bright and blue butterfly hangs on the breeze…

We all something something yellow submarine…

“blue sky” “yellow submarine”

What basic queries should an AST search engine support?

Code Phrases

Subtrees and subforests of an AST

BINARY_EXP (*)

POSTFIX (‘) BINARY_EXP (-)

IDENT (X) IDENT (y) BINARY_EXP (*)

IDENT (X) IDENT (theta)

Code Phrases

Context within a larger subtree

BINARY_EXP (*)

IDENT (X) IDENT (y) BINARY_EXP (*)

IDENT (X) IDENT (theta)

Code Phrases

Context within a larger subtree

BINARY_EXP (*)

IDENT (X) IDENT (y)

replacement site

The Codewebs index Code phrase hash AST list 2ccf02adb1cbabfb347d3b5d0a05b249855a7583 {1,3} b3bc37a318c2b895b3e644a12cfc6ebcfa5a06bd {2,4,6} b3353c96e2cee8ee6c3ba260e037a93ca0ba3a5e {7,8,10,11,12} 2c01571626bf01338c8cdb15cf9d844d65f04645 {1,5,13} 313f48d3f5888afc5d5aa28ab1393d94661edd31 {2,5,6,9,56} 61d4bfccaa97cca2004102a297cc5acd281ea3a9 {1,2} 467b4d400aab42d3bf96a119c4620e74d6fe57b3 {2,3,4} 1ae95f6fa24bc25871cdc55cb472abdd68db93de {10,19,38}

10 print “hello” 20 goto 10

10 for i=1:10 20 x = x+1 30 end

The Codewebs index Code phrase hash AST list 2ccf02adb1cbabfb347d3b5d0a05b249855a7583 {1,3} b3bc37a318c2b895b3e644a12cfc6ebcfa5a06bd {2,4,6} b3353c96e2cee8ee6c3ba260e037a93ca0ba3a5e {7,8,10,11,12} 2c01571626bf01338c8cdb15cf9d844d65f04645 {1,5,13} 313f48d3f5888afc5d5aa28ab1393d94661edd31 {2,5,6,9,56} 61d4bfccaa97cca2004102a297cc5acd281ea3a9 {1,2} 467b4d400aab42d3bf96a119c4620e74d6fe57b3 {2,3,4} 1ae95f6fa24bc25871cdc55cb472abdd68db93de {10,19,38}

10 print “hello” 20 goto 10

10 for i=1:10 20 x = x+1 30 end

Very expensive, esp. for large ASTs! def buildIndex(): for A in ASTs: for every code phrase x contained in A: Compute hashcode h[x] Insert A at h[x]

subtrees, subforests, contexts

Hashing Code Phrases

BINARY_EXP (-)

IDENT (y) BINARY_EXP (*)

IDENT (X) IDENT (theta) BINARY_EXP (-)

IDENT (y)

BINARY_EXP (*)

IDENT (X)

IDENT (theta)

Step 1. Create postorder listing of nodes.

Step 2. Hash postorder list via:

Recycling hash computations

BINARY_EXP (-)

IDENT (y)

BINARY_EXP (*)

IDENT (X)

IDENT (theta)

Observation: Can hash sublist of postorder to get hash of code phrases!

BINARY_EXP (-)

IDENT (y)

BINARY_EXP (*)

IDENT (X)

IDENT (theta)

Idea of DP: Store prefix hashes and prime powers for all

O(n) in time and space

BINARY_EXP (-)

IDENT (y)

BINARY_EXP (*)

IDENT (X)

IDENT (theta)

After precomputation, can get any other hash in constant time!

Indexing is fast in practice

0 100 200 300 400 500

Average AST size (# nodes)

Time for indexing 1000 ASTs

Application: Statistical Bug Finding

83% of ASTs containing this code phrase were buggy!

Solution: X'*(X*theta-y);

Query Index

Fail Fail Fail Pass Fail Fail

Is sum(X'*(X*theta-y)) likely to be a bug?

Query Index

Compute bug probability for all subforests, return smallest bugs found

Many ways to formulate probabilistic bug localization (we compute probability on local contexts)

Bug Detection Accuracy

00.10.20.30.40.50.60.70.80.9

0 0.2 0.4 0.6 0.8 1

Bug detection F-score (Codewebs)

neural net training with backpropagation

logistic regression objective

linear regression with gradient descent

Each point represents a single coding problem. Bubble size = Average # nodes per submitted AST

better

More than one way to skin a cat…

Canonicalization: apply semantic preserving transformation rules to ASTs to increase matching probability

X*1*(Y+Z)

X*[1;1]’*[Y;Z]

1. Impossible to predict all ways of writing the same thing

1. Canonicalization rules not

typically generalizable across languages

X*ones(1,2)*[Y;Z]

X*repmat(1,2,1)’*[Y;Z]

X*ones(1,length([Y;Z]))*[Y;Z]

X*transpose([1;1])*[Y;Z]

X*repmat(1,size([Y;Z],1),1)’*[Y;Z]

X*(Y+Z)

X*(Z+Y)

(Y+Z)*X

X*Z+X*Y

Z*X+X*Y

Z*X+Y*X

Difficulties with Canonicalization

Codewebs Approach Use data to determine canonicalization rules

Customize rules to each assignment

Don’t need to be perfect, we’re not building a compiler!

Here’s the idea. def residual (X, theta, y): hypothesis = X * theta solution = hypothesis - y return solution

def residual(X, theta, y): hypothesis = (theta’ * X’)’ solution = hypothesis - y return solution

Counter Example

Agreement can be context dependent

def foo(): solution = solveProblem() print(solution) return solution

def foo(): solution = solveProblem() print(!solution) return solution

Join on context

Query Index

Fail Fail Pass Pass Fail Pass

100% probability of equivalence! ** fine print: need to account for sample size in general

Workflow

theta = theta-alpha*1/m*(X'*(X*theta-y));

“alphaOverM” “prediction”

“residual”

Human provides:

length (y) size (X, 1) size (y, 1)

rows (X) m rows (y)

length (X) length (x (:, 1)) size (X) (1)

(theta' * X' - y')'

(X * theta - y)

({hypothesis} - y)

({hypothesis}' - y’)'

[{hypothesis} - y]

sum({hypothesis} - y, 2)

alpha * (1 ./ {m}) alpha * (1 / {m}) alpha * 1 ./ {m}

.01 / {m} alpha * {m} ^ -1 alpha .* (1 ./ {m})

1 / {m} * alpha alpha ./ {m} alpha .* (1 / {m})

alpha * inv ({m}) 1 .* alpha ./ {m} alpha * pinv ({m})

alpha / {m} alpha .* 1 / {m} 1 * alpha / {m}

(theta' * X')'

(X * theta)

theta(1) + theta (2) * X (:, 2)

(X * theta (:))

[X] * theta

sum(X.*repmat(theta',{m},1), 2) …

{alphaOverM}

{hypothesis} {residual}

Canonicalization improves bug detection accuracy

0.650.7

0.750.8

0.850.9

0.95F-

# unique ASTs considered

without canonicalization

with canonicalization

How many submissions can we give feedback to with fixed effort?

0 1 10 19

# equivalence classes

with 25 ASTsmarked

with 200 ASTsmarked

Canonicalization, 25 marked ASTS

No Canonicalization, 200 marked ASTs

If we can find shared structure, we can facilitate feedback

In education: ASTs, proofs, essays, architecture, poems…

“gradient” “residual”

“learning rate”

“base case”

“inductive hypothesis”

“apartheid”

“Afrikaner Calvinism”

“elections in South Africa”

Data is revolutionizing many fields

And it can revolutionize education too!

Thank you!! jhuang11@stanford.edu http://www.stanford.edu/~jhuang11 @jonathanhuang11

Data driven student feedback for Programming Intensive MOOCs · Coursera’s ML course [Moocshop,...

Documents

Transcript of Data driven student feedback for Programming Intensive MOOCs · Coursera’s ML course [Moocshop,...

20m 10m 10m 20m 30m 40m 50m 60m 70m 80m 90m 100m ...œ°図.pdf20m 10m 10m 20m 30m 40m 50m 60m 70m 80m 90m 100m 1000 m

iC-NG · · 2014-02-21cc05 rc05 sin cos cc03 rs03 rz03 cz03 nzero zero 1k r006 r004 1k ner nrd r009 1k 1k r003 78xx x003 7805 d7 1k r007 100nf c001 x002 24c02 ser. eeprom nwr scl

Roland TD-1K manual

Healthier Homes: How to Cost-Effectively Deliver Buyers ...summit.eeba.org/data/energymeetings/presentations/Butterfield--Deliver-Healthy...$.1k $.7k $2k $1k $5k $1k $.1k $ $3 $13k

20m 20m 20m 20m i i i I I I I I I i I I i i I I I I i 20m 20m …...20m 20m 20m 20m i i i I I I I I I i I I i i I I I I i 20m 20m 20m 20m 20m 20m 20m' 0 25 50 100 150 200

1K Saves the Earth

QUINOA - 12/1K

·1k INTERVIEW KICKSTART

· 2020. 3. 24. · JB181 1k 80x 120tn 'ONE 2.2k' LION *498B SP.5 (EZ547B) 7008 (0288) 22-112t Jil t APA.136.B T-faL 0717V—a— 9770 1,300w 1k x 20m 30-180. Ix 1m 1 x 2m x 10m

Hong Kong Design Institute...Chalvy Choy Ma Child e, Leon Lau E-ñ-— ( Code : 67070M ) '-1K $9,800 E-ñ-= ( Code : 67071M ) 1-1K $8,800 : ( code 67069M ) 1-1K ( ) 1-1K $17,600 1-1K

Compiled Top 1k Smidist

024-573-7880 15.55 16.32 16:42 16:46 16:52 +$J …024-573-7880 15.55 16.32 16:42 16:46 16:52 +$J 17:25 16:10 16.28 16:40 6 20m 20m 20m 20m 15m 15m 85m 20m 10 9 11B 9 25 9 27 …

edicfile Ffi 8.2% , , nAcc - Hong Kong Sanatorium & … · p fi propion HCI 1k fi 1ððTIN' fiE Ile ákJ , —l 'J 7k (Z ban Var nicline) ÊËT , , 102 ákJ Ffr 1k 1k ákJ 1k n 7k

ekg pat 1K

TVC - Space Simulator 20m³ (706ft³)

· 2017-06-28 · 2015 (2011 . 2015 47 FJ 1 whbwhcxû163. com, 2015 FJ 10 àû 010-59881677 whbwhcxû163. com 10 1000200 2015 17 . 1k 1k H H 1k 1k SD . 1k 1k 1k -K 1k ...

1K Red Shirt

make 1k every month

User Manual for Distributed flow routing model: 1K -FRMhywr.kuciv.kyoto-u.ac.jp/products/1K-DHM/document/1K-FRM-ver138.… · User Manual for Distributed flow routing model: 1K -FRM

Warrick Conductivity-Based Liquid Level Control · Lithium Chloride 1K 1K N.A. Carpenter 20 Magnesium Chloride 1K 1K 316 Stainless Steel Carpenter 20 Magnesium Hydroxide 2.2K 450