Peer-Grading in a Course on Algorithms and Data...

13
ei.is.tue.mpg.de/~msajjadi Peer-Grading in a Course on Algorithms and Data Structures Machine Learning Algorithms do not Improve over Simple Baselines Mehdi S. M. Sajjadi Morteza Alamgir Ulrike von Luxburg MPI IS Tübingen Uni Hamburg Uni Tübingen Learning @ Scale 26.04.2016

Transcript of Peer-Grading in a Course on Algorithms and Data...

Page 1: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter

ei.is.tue.mpg.de/~msajjadi

Peer-Grading in a Course onAlgorithms and Data StructuresMachine Learning Algorithms do not Improve over Simple Baselines

Mehdi S. M. SajjadiMorteza Alamgir

Ulrike von Luxburg

MPI IS TübingenUni HamburgUni Tübingen

Learning @ Scale26.04.2016

Page 2: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter

ei.is.tue.mpg.de/~msajjadi

Peer-Grading Setting

� How to aggregate grades?

� Are peer grades as accurate as TA grades?

� Challenges� Inexperience� Bias� Limited number of grades� Cheating

Mehdi S. M. Sajjadi

Page 3: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter

ei.is.tue.mpg.de/~msajjadi

Our University Class

� Algorithms and Data Structures (easy-demanding tasks)

� 1 semester, 220 students, ~14.000 grades

� Groups of 3 students for solving exercises

� Detailed scoring rubric

� Grading done by everyone on their own

� All submissions graded in 3 different ways:� Self-assessment, peer-grading, TA grading

Mehdi S. M. Sajjadi

Page 4: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter

ei.is.tue.mpg.de/~msajjadi

A glance at one exercise…

� Easy multiple choice with proofs, 0 or 1 point each

� Mean absolute deviation from TA grade� 0.08 Peer� 0.12 Self� 0.08 Peer + Self

Mehdi S. M. Sajjadi

Page 5: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter

ei.is.tue.mpg.de/~msajjadi

…and another exercise.

� Design algorithm, prove runtime

� Algorithms with bad runtime

� Students did not realize mistake

� Mean absolute deviation from TA grade� 0.18 Peer� 0.28 Self� 0.21 Peer + Self

Mehdi S. M. Sajjadi

Page 6: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter

ei.is.tue.mpg.de/~msajjadi

Overall Grade Comparison

� Overall bias� 0.06 Peer� 0.12 Self

� Large variance

� Good base for improvements?

Mehdi S. M. Sajjadi

Page 7: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter

ei.is.tue.mpg.de/~msajjadi

Probabilistic Model

� 𝑧𝑎𝑔 ~ 𝑁 𝑠𝑎 + 𝑏𝑔, 1/𝑟𝑔 Reported Scores

� 𝑠𝑎 ~ 𝑁 𝜇, 𝜎2 True Scores

� 𝑏𝑔 ~ 𝑁 0, 𝜂2 Bias

� 𝑟𝑔 ~ Γ 𝛼, 𝛽 Reliability

� EM algorithm for parameter estimation

� Works well on artificial data� Other algorithms in the literature yield similar results

C. Piech et al. Tuned Models of Peer Assessment in MOOCs. EDM, 2013.Mehdi S. M. Sajjadi

Page 8: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter

ei.is.tue.mpg.de/~msajjadi

Results on our dataset

Mehdi S. M. Sajjadi

Page 9: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter

ei.is.tue.mpg.de/~msajjadi

What about the ranking?

Mehdi S. M. Sajjadi

Page 10: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter

ei.is.tue.mpg.de/~msajjadi

Why no improvements?

Mehdi S. M. Sajjadi

Page 11: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter

ei.is.tue.mpg.de/~msajjadi

Possible Reasons I

� Amount of data? (6 peer grades per assignment)� Same results with 15 peer grades� Bias and reliability estimation over all assignments

� Wrong priors?� They barely change the results

� Unmotivated students / useless reviews?� Very few, and models should benefit from them anyway

� Different types of assignments?� Grouping them by grading difficults did not change results

Mehdi S. M. Sajjadi

Page 12: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter

ei.is.tue.mpg.de/~msajjadi

Possible Reasons II

� (Different) TAs as baseline?� The 6 TAs graded similarly� Some noisy in ground truth does not hurt models

� Bias vs. Reliability!� Most errros due to low reliabilities, not bias� This kind of error is hard to correct (artificial models)

� Other sources of errors!� Errors often a result of lacking knowledge� Hard to correct for this

Mehdi S. M. Sajjadi

Page 13: Peer-Grading in a Course on Algorithms and Data Structureslearningatscale.acm.org/las2016/wp-content/uploads/2016/05/Sajjadi... · N Ú~Γ Ù, Ú Reliability EM algorithm for parameter

ei.is.tue.mpg.de/~msajjadi

Conclusion

� Models fail to improve over mean estimator

� Main reasons� Sources of errors are different from the assumption� Not much bias� Reliability difficult to estimate and correct

� Are complicated models acceptable for students?

� Is peer grading a viable option for university courses?

� The dataset is publicly available on our websites

Mehdi S. M. Sajjadi