Is peer review any good? A quantitative analysis of peer review
-
Upload
aliaksandr-birukou -
Category
Education
-
view
2.038 -
download
0
description
Transcript of Is peer review any good? A quantitative analysis of peer review
Ispeerreviewanygood?Aquan4ta4ve
analysisofpeerreview
FabioCasa),MaurizioMarchese,AzzurraRagone,Ma6eoTurrini
UniversityofTrento
h6p://eprints.biblio.unitn.it/archive/00001654/01/techRep045.pdf
Ini)alGoals
• Understandhowwellpeerreviewworks
• Understandhowtoimprovetheprocess
• Metrics+Analysis– (refertoliquiddoc)
• Focusonlyongatekeepingaspect “Not everything that can be counted counts,
and not everything that counts can be counted.” -- Albert Einstein
MetricDimensions
Quality
Fairness Efficiency
Kendall Distance
Divergence
Disagreement
Biases
Robustness
Unbiasing
Effort vs quality. Effort-invariant alternatives
DataSets
• Around7000reviewsfromvariousconferencesintheCSfield(moreontheway)– Large,medium,small– Somewith“youngreviewers”
Ispeerrevieweffec)ve?Doesitwork?
• Andwhatdoesitmeantobeeffec)ve?HOWdowemeasureit?
• Easiertomeasure/detect“problems”
• Peerreviewrankingvs.idealranking
Comparingrankings
28 17 2
45 67 .. ..
89 33 ..
33 89 2
17 67 ..
28 .. ..
45
Idealranking(?)
• Successinasubsequentphase• Cita)ons
Suggested reading: Positional effect on citation and readership in arXiv, by Haque and Ginsparg
Comparingrankings
T=3
N=10
Divergence:Div(t,N)Kendallτ
9
Results:peerreviewrankingvs.cita)oncount
10
Divergence
Div
Normalizedt
Randomnessandreliability
• Quality‐relatedbutindependentofthecriteriaforthe“ideal”ranking
• Basicstats• Disagreement
• Robustness• Biases
11
Quality‐relatedMetrics:Sta)s)cs
12
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0 1 2 3 4 5 6 7 8 9 10
Prob
ability
Marks
Distribu4onofmarks(integermarks)
Disagreement
• Measurethedifferencebetweenthemarksgivenbythereviewersonthesamecontribu4on.
• Thera)onalebehindthismetricisthatinareviewprocessweexpectsomekindofagreementbetweenreviewers.
NormalizedDisagreement(aferdiscussion)
14
C1 C2 C3
Computed 0,27 0,32 (high variance) 0,26 (high variance)
Reshuffled 0,34 0,40 0,32
Robustness
• Sensi)vitytosmallvaria)oninthemarks– Triestoassesstheimpactofsmallindecisionsingivingthemark(e.g.,6vs7…..)
• Measuresdivergenceaferapplyinganε‐varia)ontothemark
• Results:reasonablyrobustexceptfortheconferencemanagedbyyoungresearchers
15
Metricdimensions
Quality
Fairness Efficiency
Statistics
Kendall Distance
Divergence
Disagreement
Biases
Robustness Unbiasing
Effort
Fairness
• Defini)on:Areviewprocessisfairifandonlyoftheacceptanceofacontribu)ondoesnotdependonthepar)cularsetofPCmembersthatreviewsit
• Thekeyisintheassignmentofapapertoreviewers:Apaperassignmentisunfairifthespecificassignmentinfluences(makesmorepredictable)thefateofthepaper.
Poten)albiases
• Ra4ngbias:Reviewersarebiasediftheyconsistentlygivehigher/lowermarksthantheircolleagueswhoarereviewingthesamepaper
• Affilia4onbias
• Topicbias• Countrybias• Genderbias• …
ComputedNormalizedRa)ngBiases
C2 C3 C4
top accepting 3,44 1,52 1,17
top rejecting -2,78 -2,06 -1,17
> + |min bias| 5% 9% 7%
< - |min bias| 4% 8% 7%
C2 C3 C4
Unbiasing effect (divergence) 9% 11% 14%
Unbiasing effect (reviewers affected) 16 5 4
20