Post on 19-Jul-2015
Does peer grading work? How to implement and improve it?
Comparing instructor and peer assessment in MOOC GdP
Rémi Bachelet, Drissa Zongo, Aline Bourelle
Download this slideshow : http://goo.gl/GiFvXb
Massive evaluation in MOOCs : Peer assessment vs. Quizzes
• Quizzes– Massive scale, but
• inability to process, grade and provide feedback for complex and open-ended student assignments
• no critical thinking
• Peer assessment – Evaluating rich assignments on a massive scale – Possible?
Accurate?
– Major learning benefits expected, • student autonomy, teaching paradigm shift
• in Bloom's taxonomy, higher levels of learning
2
4 Research questions
1. How to train MOOC students to grade their peers and provide constructive feedback? – Qualitative/experience testing
2. Is peer grading as accurate as instructor grading? Superior?– Quantitative data/hypothesis testing
3. Which grading algorithm is best?– Quantitative data/hypothesis testing:
4. How many peer grades are required to provide an accurate final grade?– Quantitative data/hypothesis testing
3
“Fundamentals of project management" MOOC / MOOC GdP, session n°2
• Dataset: 1011 to 831 assignments submitted each week, for 5 weeks
– 4650 assignments total.
• Variety of assignments
– (next slide)
• Both instructor and peer grading were available
– 3-5 peer grades and one instructor/AT grade
4
5 assignments
5
Q1: How to train students to grade their peers and provide constructive feedback?
• Generic peer Evaluation training:– Major requirement of the advanced track
– 2+ videos • rationale and importance of peer assessment
• how to write motivating and constructive feedback
• guidelines on how to use the platform for peer grading
• Specific peer Evaluation training:– Specific resources for each assignment
• benchmark assignment, tutorial video
• interactive grading rubric
• discussion thread (1649 total posts) 6
Q2: Is individual peer grading as accurate as instructor grading?
• ±5%, ± 10% similarity to “real” grade– Instructors => Suchaut, B. (2008) => 39% and 65%
– Our MOOC students => our data => 36% and 60%
… but this is individual student grading
Will processing the average of peer grades instead of using only one perform better?
– Our MOOC students => average of 3-5 grades => 56% and 82%
Average grade given by MOOC students more accurate than instructor’s
7
Q3: best algorithm: average or median?
“Error functions”: difference with instructor grades of either the average or the median of students grades.
Average slightly more accurate than median8
Q4: How many peer grades to correctly estimate “best grade”?
Peer grading quickly performs better (with two peers), thaninstructor’s grading
Best “return” with 3-4 peer grades
9
Improving peer evaluation monitoring and grades processing in MOOC GdP 4 and 5
• Estimate the quality of grades issued by peers
• Act on this information: – dedicated VBA/Excel application => feedback on whether each
grade was correct, high or low
– .. reward accurate grading
– track whether peer grading improved with time during the course
• Add self-evaluation: best source for learning• New system, developed for Canvas in association with Unow
• Students were asked to get a fresh look at their own work and grade it after 1/having evaluated at least 3 other student’s assignments and 2/getting feedback on their own assignment by other students.
10
Conclusions
• Peer evaluation displays promising potential
• Not easy to implement on a massive, open scale
– Assignments = careful work, beta testing (100 hours)
– New assignments/case study for each session
– Dedicated data processing, develop team expertise
– Carefully set up:
• Deadlines reminders, targeted messages,
• How each student gets feedback
• Rewards accurate grading
• Monitoring: manual grading is still required (10-1%)
11
Recommendations for researchers
• Look closely at peer grades distribution before hypothesis testing• How many assignments should a student be required to grade? We
recommend 4– accounting for peers who drop out of the process– time to work on self-assessment.
• What algorithm should be preferred? – average if grading data has been correctly checked and filtered.– otherwise, median is more robust (just remove outliers and get more evaluations).
• When to switch from automatic peer grading to manual instructor grading?1. less than 2 peer grades2. non-consensus (i.e. peer grades standard deviation >20)3. presence of a “0” grade
… GdP4: 10%, 9% and 1.6% of assignments 1, 2 and 3 were graded manually.
12
Limitations of this study
• Develop theoretical framework & literature review
• Data processing: implement non-parametric testing
13
« Does peer grading work? How to implement and
improve it? ». European MOOCs Stakeholders Summit
2015, May 2015, Research Track
https://goo.gl/3QCXDG
14
Peer Grading Research Track -
Auditorium 4, Tuesday, 10am
Thanks for listening!
• Twitter : @R_Bachelet, Googleplus : +Rémi Bachelet
• Mes contributions sur les MOOC
• MOOC GdP
– Enroll : gestiondeprojet.pm
– English version of courses in 2015-2016
– Twitter : #MOOCGdP
16
Année 2013/2014
ANNEXA glimpse at the stats
18
Q2: What data pre-processing is to be used?
histograms & density
Methodology:histograms and density
19
Q2 : Do grades follow a normal distribution?
Test of Normality
MethodologyTest of Normality : Shapiro-Wilk test.
Shapiro.test(data)
- H0 : -> Normal distribution
- H1 : -> Not a Normal distribution
Results
Seuil Alpha = 0.05if p-value > 0.05 => H0if p-value < 0.05 => H1
P-value < 2.2e-16 <0.05Not a Normal distribution
20
Q3 : Similarity between peers grades et teachers grades? (1/2)
Methodology
Scatter plot&Line (D): y=x
21
MethodologyKendall correlation cor.test(EP, Pairs ,method="kendall")Pearson correlation cor.test(EP, Pairs)Hypothesis:- H0 : the correlation is nul - H1 : the correlation is not nul
Theshold: 0.05if p-value > 0.05 => H0if p-value < 0.05 => H1
P-value < 0.05 => there is a correlationcorrelation > 0.5 => strong correlation
Correlation (EP,
Mean (peers
grades))
Pearson Correlation Kendall Correlation
correlation
(cor)
p-value correlation (tau) p- value
0,77251 < 2.2e-16 0,6336516 < 2.2e-16
22
Q3 : Similarity between peers grades et teachers grades? (2/2)
Q4: best algorithm: average or median?
Study of the « error function »
ErreurMoy = Mean(peers grades) – Instructor Team gradesErreurMed = Median (peers grades) – Instructor Team grades
Etude des erreurs introduites
ErreurMoy < ErreurMed
Mean (average) is the best
23
Q4: best algorithm: average or median ?
study of the difference between the two errors
Ecart =|ErreurMoy|–|ErrreurMed|
Median :-0.7500 Mean : -0.9867 => |Median Errror | >|Mean Error |
coefficient of skewness : -0.2145285 <0 => more negative than positive value
24
Median introduce slightlymore errors than Average