eMOOCs2015 Does peer grading work?

Post on 19-Jul-2015

4.209 views 2 download

Transcript of eMOOCs2015 Does peer grading work?

Does peer grading work? How to implement and improve it?

Comparing instructor and peer assessment in MOOC GdP

Rémi Bachelet, Drissa Zongo, Aline Bourelle

Download this slideshow : http://goo.gl/GiFvXb

Massive evaluation in MOOCs : Peer assessment vs. Quizzes

• Quizzes– Massive scale, but

• inability to process, grade and provide feedback for complex and open-ended student assignments

• no critical thinking

• Peer assessment – Evaluating rich assignments on a massive scale – Possible?

Accurate?

– Major learning benefits expected, • student autonomy, teaching paradigm shift

• in Bloom's taxonomy, higher levels of learning

2

4 Research questions

1. How to train MOOC students to grade their peers and provide constructive feedback? – Qualitative/experience testing

2. Is peer grading as accurate as instructor grading? Superior?– Quantitative data/hypothesis testing

3. Which grading algorithm is best?– Quantitative data/hypothesis testing:

4. How many peer grades are required to provide an accurate final grade?– Quantitative data/hypothesis testing

3

“Fundamentals of project management" MOOC / MOOC GdP, session n°2

• Dataset: 1011 to 831 assignments submitted each week, for 5 weeks

– 4650 assignments total.

• Variety of assignments

– (next slide)

• Both instructor and peer grading were available

– 3-5 peer grades and one instructor/AT grade

4

5 assignments

5

Q1: How to train students to grade their peers and provide constructive feedback?

• Generic peer Evaluation training:– Major requirement of the advanced track

– 2+ videos • rationale and importance of peer assessment

• how to write motivating and constructive feedback

• guidelines on how to use the platform for peer grading

• Specific peer Evaluation training:– Specific resources for each assignment

• benchmark assignment, tutorial video

• interactive grading rubric

• discussion thread (1649 total posts) 6

Q2: Is individual peer grading as accurate as instructor grading?

• ±5%, ± 10% similarity to “real” grade– Instructors => Suchaut, B. (2008) => 39% and 65%

– Our MOOC students => our data => 36% and 60%

… but this is individual student grading

Will processing the average of peer grades instead of using only one perform better?

– Our MOOC students => average of 3-5 grades => 56% and 82%

Average grade given by MOOC students more accurate than instructor’s

7

Q3: best algorithm: average or median?

“Error functions”: difference with instructor grades of either the average or the median of students grades.

Average slightly more accurate than median8

Q4: How many peer grades to correctly estimate “best grade”?

Peer grading quickly performs better (with two peers), thaninstructor’s grading

Best “return” with 3-4 peer grades

9

Improving peer evaluation monitoring and grades processing in MOOC GdP 4 and 5

• Estimate the quality of grades issued by peers

• Act on this information: – dedicated VBA/Excel application => feedback on whether each

grade was correct, high or low

– .. reward accurate grading

– track whether peer grading improved with time during the course

• Add self-evaluation: best source for learning• New system, developed for Canvas in association with Unow

• Students were asked to get a fresh look at their own work and grade it after 1/having evaluated at least 3 other student’s assignments and 2/getting feedback on their own assignment by other students.

10

Conclusions

• Peer evaluation displays promising potential

• Not easy to implement on a massive, open scale

– Assignments = careful work, beta testing (100 hours)

– New assignments/case study for each session

– Dedicated data processing, develop team expertise

– Carefully set up:

• Deadlines reminders, targeted messages,

• How each student gets feedback

• Rewards accurate grading

• Monitoring: manual grading is still required (10-1%)

11

Recommendations for researchers

• Look closely at peer grades distribution before hypothesis testing• How many assignments should a student be required to grade? We

recommend 4– accounting for peers who drop out of the process– time to work on self-assessment.

• What algorithm should be preferred? – average if grading data has been correctly checked and filtered.– otherwise, median is more robust (just remove outliers and get more evaluations).

• When to switch from automatic peer grading to manual instructor grading?1. less than 2 peer grades2. non-consensus (i.e. peer grades standard deviation >20)3. presence of a “0” grade

… GdP4: 10%, 9% and 1.6% of assignments 1, 2 and 3 were graded manually.

12

Limitations of this study

• Develop theoretical framework & literature review

• Data processing: implement non-parametric testing

13

« Does peer grading work? How to implement and

improve it? ». European MOOCs Stakeholders Summit

2015, May 2015, Research Track

https://goo.gl/3QCXDG

14

Peer Grading Research Track -

Auditorium 4, Tuesday, 10am

Thanks for listening!

• Twitter : @R_Bachelet, Googleplus : +Rémi Bachelet

• Mes contributions sur les MOOC

• MOOC GdP

– Enroll : gestiondeprojet.pm

– English version of courses in 2015-2016

– Twitter : #MOOCGdP

16

Année 2013/2014

ANNEXA glimpse at the stats

18

Q2: What data pre-processing is to be used?

histograms & density

Methodology:histograms and density

19

Q2 : Do grades follow a normal distribution?

Test of Normality

MethodologyTest of Normality : Shapiro-Wilk test.

Shapiro.test(data)

- H0 : -> Normal distribution

- H1 : -> Not a Normal distribution

Results

Seuil Alpha = 0.05if p-value > 0.05 => H0if p-value < 0.05 => H1

P-value < 2.2e-16 <0.05Not a Normal distribution

20

Q3 : Similarity between peers grades et teachers grades? (1/2)

Methodology

Scatter plot&Line (D): y=x

21

MethodologyKendall correlation cor.test(EP, Pairs ,method="kendall")Pearson correlation cor.test(EP, Pairs)Hypothesis:- H0 : the correlation is nul - H1 : the correlation is not nul

Theshold: 0.05if p-value > 0.05 => H0if p-value < 0.05 => H1

P-value < 0.05 => there is a correlationcorrelation > 0.5 => strong correlation

Correlation (EP,

Mean (peers

grades))

Pearson Correlation Kendall Correlation

correlation

(cor)

p-value correlation (tau) p- value

0,77251 < 2.2e-16 0,6336516 < 2.2e-16

22

Q3 : Similarity between peers grades et teachers grades? (2/2)

Q4: best algorithm: average or median?

Study of the « error function »

ErreurMoy = Mean(peers grades) – Instructor Team gradesErreurMed = Median (peers grades) – Instructor Team grades

Etude des erreurs introduites

ErreurMoy < ErreurMed

Mean (average) is the best

23

Q4: best algorithm: average or median ?

study of the difference between the two errors

Ecart =|ErreurMoy|–|ErrreurMed|

Median :-0.7500 Mean : -0.9867 => |Median Errror | >|Mean Error |

coefficient of skewness : -0.2145285 <0 => more negative than positive value

24

Median introduce slightlymore errors than Average