Quality criteria for data aggregation used in academic rankings IREG FORUM on University rankings ...

1

Quality criteria for data aggregation used in academic rankings

IREG FORUM on University rankings Methodologies under scrutiny

16-17 May 2013, Warsaw, Poland

Michaela Saisana [email protected]

European Commission, Joint Research Centre,

Econometrics and Applied Statistics Unit

2

Outline Global rankings at the forefront of the policy

debate Overview of two global university rankings

(ARWU, THES)

Statistical Coherence Tests Uncertainty analysis Policy Implications Conclusions

3

Outline Global rankings at the forefront of the policy

debate Overview of two global university rankings

(ARWU, THES)


4

• Definition of the university is broad:

A university – as the name suggests – tends to encompass a broad range of purposes and dimensions, focus and missions difficult to condense into a compact measure

• Still, for reasons of governance, accountability and transparency, there is an increasing interest among policymakers as well as among practitioners in measuring and benchmarking "excellence" across universities.

• The growing mobility of students and researchers has also created a market for these measures among the prospective students and their families.

Global rankings at the forefront of the policy debate

5

• Global rankings have raised debates and policy responses (EU, national level):

to improve the positioning of a country within the existing measures,

to create new measures, to discuss regional performance (e.g. show that USA

is well ahead of Europe in terms of cutting-edge university research)


6

19 48407

7271,310

4,590

8,090

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

8,000

9,000

1940 1950 1960 1970 1980 1990 2000 2010 2020

Year

Scho

lar G

oogl

e hi

ts o

n "u

nive

rsity

rank

ings

" 10-fold increase in the last 10 years

Guess how many contain the word “THES ranking” or “ARWU ranking”?20%


7

1. Academic Ranking of World Universities (ARWU) (Shanghai Jiao Tong University), 2003

2. Webometrics (Spanish National Research Council), 20033. World University Ranking (Times Higher Education/Quacquarelli

Symonds), 2004–094. Performance Ranking of Scientific Papers for Research Universities

(HEEACT), 20075. Leiden Ranking (Centre for Science & Technology Studies,

University of Leiden), 20086. World's Best Colleges and Universities (US News and World Report),

20087. SCImago Institutional Rankings, 20098. Global University Rankings (RatER) (Rating of Educational

Resources, Russia), 20099. Top University Rankings (Quacquarelli Symonds), 201010.World University Ranking (Times Higher Education/Thomson

Reuters—THE-TR), 201011. U-Multirank (European Commission), 2011


• Over 60 countries have introduced national rankings, and there are numerous regional, specialist and professional rankings.

8

University rankings are used to judge about the performance of

university systems …whether intended or not on by

their proponents


9

France: Creation of 10 centres of HE

excellence The minister of Education set

a target to put at least 10 French universities among the top 100 in ARWU by 2012

President has put French standing in these international ranking at the forefront of the policy debate (Le Monde, 2008).

Italy (0 Uni in the top 100 of the ARWU ranking seen as failure of the national educational system).

Spain ( 1 Uni in the top 200 of the ARWU hailed as a great national achievement)


10

An OECD study shows that worldwide university leaders are concerned about ranking systems with consequences on the strategic and operational decisions they take to improve their research performance.

(Hazelkorn, 2007)

There over 16,000 HEIs, yet some of the global rankings merely capture the top 100 universities – less than 1%.

(Hazelkorn, 2013)


11

An extreme impact of Global Rankings

What - 2005 THES created a major controversy in Malaysia: country’s top two universities slipping by almost 100 places compared to 2004.

Why - change in the ranking methodology (not well known fact and of limited comfort)

Impact - Royal commission of inquiry to investigate the matter. A few weeks later, the Vice-Chancellor of the University of Malaysia stepped down.


12


Overview of two global university rankings (ARWU, THES)


13

Criteria Indicator Weight Quality of Education

Alumni of an institution winning Nobel Prizes and Fields Medals

10%

Staff of an institution winning Nobel Prizes and Fields Medals

20% Quality of

Faculty Highly cited researchers in 21 broad subject categories

20%

Articles published in Nature and Science 20% Research Output Articles in Science Citation Index-

expanded, Social Science Citation Index 20%

Academic performance

Academic performance with respect to the size of an institution

10%

PROS and CONS

6 « objective » indicators Focus on research performance, overlooks other U.

missions. Biased towards hard-science institutions Favours large institutions

METHODOLOGY

6 indicators

Best performing institution =100; score of other institutions calculated as a percentage

Weighting scheme chosen by rankers

Linear aggregation of the 6 indicators

Overview – 2007 ARWU ranking

14

PROS and CONS Attempt to take into account teaching quality Two expert-based indicators: 50% of total

(Subjective indicators, lack of transparency) yearly changes in methodology Measures research quantity

METHODOLOGY 6 indicators z-score calculated for

each indicator; best performing institution =100; other institutions are calculated as a percentage

Weighting scheme: chosen by rankers

Linear aggregation of the 6 indicators

Criteria Indicator Weight

Academic Opinion: Peer review, 6,354 academics 40%

Research Quality Citations per Faculty: Total citation/ Full Time Equivalent faculty 20%

Graduate Employability Recruiter Review: Employers’ opinion, 2,339 recruiters 10%

International Faculty: Percentage of international staff 5% International Outlook

International Students: Percentage of international students 5%

Teaching Quality Student Faculty: Full Time Equivalent faculty/student ratio 20%

Overview – 2007 THES ranking

15

1 – Same top10: Harvard, Cambridge, Princeton, Cal-tech, MIT and Columbia2 - Greater variations in the middle to lower end of the rankings3 - Europe is lagging behind: both ARWU (else SJTU) and THES rankings

Overview- Comparison (2007)

4 – THES favours UK universities: all UK universities below the line (in red)

...

Either this graph or Table 3 from our paper.

16

University rankings- yearly published+ Very appealing for capturing a university’s

multiple missions in a single number+ Allow one to situate a given university in the

worldwide context - Can lead to misleading and/or simplistic

policy conclusions

17

Question:Can we say something about the quality of the university rankings and the reliability of the results?

18




19

The Stiglitz report (p.65): […] a general criticism that is frequently addressed at composite indicators, i.e. the arbitrary character of the procedures used to weight their various components. […] The problem is not that these weighting procedures are hidden, non-transparent or non-replicable – they are often very explicitly presented by the authors of the indices, and this is one of the strengths of this literature. The problem is rather that their normative implications are seldom made explicit or justified.

Statistical coherence

20


21

Y = 0.5 x1+ 0.5 x2

Statistical coherence - Dean’s example

X1: hours of teaching

X2: # of publications

Estimated R12 = 0.0759, R2

2 = 0.826, corr(x1, x2) =−0.151, V(x1) = 116, V(x2) = 614, V(y) = 162

22

To obviate this, the dean substitutes the model

A professor comes by, looks at the last formula, and complains that publishing is disregarded in the department …

X1: hours of teaching X2: number of publications

Statistical coherence - Dean’s example

Y = 0.5 x1+ 0.5 x2

Y = 0.7 x1+ 0.3 x2

with

23

Using these points we can compute a statistic that tells us:

Example: Si =0.88 we could reduce the variation of the ARWU scores by 88% by fixing ‘Papers in Nature & Science’.

Si: ruler for ‘importance’


ARWU score

24


First order sensitivity index

Pearson’s correlation ratio Smoothed curve

Unconditional variance

Our suggestion: to assess the quality of a composite indicator using – instead of Ri

2 (Pearson product moment correlation coefficient of the regression of y on xi) its non-parametric equivalent

25

Features:• it offers a precise definition of importance, that is ‘the

expected reduction in variance of the CI that would be obtained if a variable could be fixed’;

• it can be used regardless of the degree of correlation between variables;

• it is model-free, in that it can be applied also in non-linear aggregations;

• it is not invasive, in that no changes are made to the CI or to the correlation structure of the indicators (unlike what we will see next on uncertainty analysis).


Pearson’s correlation ratio

‐ First order effect‐ Top marginal variance- Main effect…

Source: Paruolo, Saisana, Saltelli, 2013, J.Royal Stat. Society A

26

One can hence compare the importance of an indicator as given by the nominal weight (assigned by developers) with the importance as measured by the first order effect (Si) to test the index for coherence.


27

Statistical coherence - ARWU

Si’s are more similar to each other than the nominal weights, i.e. ranging between 0.14 and 0.19 (normalized Si’s to unit sum; CV estimates) when weights should either be 0.10 or 0.20.


28

Statistical coherence - THES

• The combined importance of peer-review variables (recruiters and academia) appears larger than stipulated by developers, indirectly supporting the hypothesis of linguistic bias at times addressed to THES.

• The teacher/student ratio, a key variable aimed at capturing the teaching dimension, is much less important than it should be (normalized Si is 0.09, nominal weight is 0.20).


29




30

• Notwithstanding recent attempts to establish good practice in composite indicator construction (OECD, 2008), “there is no recipe for building composite indicators that is at the same time universally applicable and sufficiently detailed” (Cherchye et al., 2007).

• Booysen (2002, p.131) summarises the debate on composite indicators by noting that “not one single element of the methodology of composite indexing is above criticism”.

• Andrews et al. (2004)] argue that “many indices rarely have adequate scientific foundations to support precise rankings: […] typical practice is to acknowledge uncertainty in the text of the report and then to present a table with unambiguous rankings”

Uncertainty analysis - Why?

31

Space of alternatives

Including/excluding variables

Normalisation

Missing dataWeights

Aggregation

Country 1

10

20

30

40

50

60

Model averaging: whenever a choice in the composite setting-up may not be strongly supported or if you may not trust one single model, we’ll recommend you to use more models

Country 2

Country 3

Uncertainty analysis - How?

32

How to shake coupled stairs

How coupled stairs are shaken in most of available literature

Uncertainty analysis - How?

33

Objective of UA:

NOT to verify whether the two global university rankings are legitimate models to measure university performance

To test whether the rankings and/or their associated inferences are robust or volatile with respect to changes in the methodological assumptions within a plausible and legitimate range.

Uncertainty analysis – ARWU & THES


Source: Saisana, D’Hombres, Saltelli, 2011, Research Policy 40, 165–177

34

Activate simultaneously different sources of uncertainty that cover a wide spectrum of methodological assumptions

Estimate the FREQUENCY of the university ranks obtained in the different simulations

imputation weighting

normalization

Number of indicators

Aggregation

Assumption Alternatives

Number of indicators all six indicators included or

one-at-time excluded (6 options)

Weighting method original set of weights,

factor analysis,

equal weighting,

data envelopment analysis

Aggregation rule additive,

multiplicative,

Borda multi-criterion

70 scenarios

Uncertainty analysis – ARWU & THES

35

Harvard, Stanford, Berkley, Cambridge, MIT: top 5 in more than 75% of our simulations.

Univ California: original rank 18th but could be ranked anywhere between the 6th and 100th position

Impact of assumptions: much stronger for the middle ranked universities

Legend:Frequency lower 15%Frequency between 15 and 30%Frequency between 30 and 50%Frequency greater than 50%Note: Frequencies lower than 4% are not shown

1-5

6-10

11-1

516

-20

21-2

526

-30

31-3

536

-40

41-4

546

-50

51-5

556

-60

61-6

566

-70

71-7

576

-80

81-8

586

-90

91-9

596

-100 Original

rankHarvard Univ 100 1 USAStanford Univ 89 11 2 USAUniv California - Berkeley 97 3 USAUniv Cambridge 90 10 4 UKMassachusetts Inst Tech (MIT) 74 26 5 USACalifornia Inst Tech 27 53 19 6 USAColumbia Univ 23 77 7 USAPrinceton Univ 71 9 11 7 8 USAUniv Chicago 51 34 13 9 USAUniv Oxford 99 10 UKYale Univ 47 53 11 USACornell Univ 27 73 12 USAUniv California - Los Angeles 9 84 7 13 USAUniv California - San Diego 41 46 9 14 USAUniv Pennsylvania 6 71 23 15 USAUniv Washington - Seattle 7 71 21 16 USAUniv Wisconsin - Madison 27 70 17 USAUniv California - San Francisco 14 9 14 11 7 10 6 6 18 USATokyo Univ 16 16 49 20 19 JapanJohns Hopkins Univ 7 54 21 17 20 USA

Simulated rank range - SJTU 2008

Uncertainty analysis – ARWU

36

Impact of uncertainties on the university ranks is even more apparent.

M.I.T.: ranked 9th, but confirmed only in 13% of simulations (plausible range [4, 35])

Very high volatility also for universities ranked 10th-20th position, e.g., Duke Univ, John Hopkins Univ, Cornell Univ.

Legend:Frequency lower 15%Frequency between 15 and 30%Frequency between 30 and 50%Frequency greater than 50%Note: Frequencies lower than 4% are not shown

1-5

6-10

11-1

516

-20

21-2

526

-30

31-3

536

-40

41-4

546

-50

51-5

556

-60

61-6

566

-70

71-7

576

-80

81-8

586

-90

91-9

596

-100

HARVARD University 44 56 1 USAYALE University 40 49 11 2 USAUniversity of CAMBRIDGE 99 3 UKUniversity of OXFORD 93 7 4 UKCALIFORNIA Institute of Technology 46 50 5 USAIMPERIAL College London 74 24 6 UKUCL (University College London) 73 23 7 UKUniversity of CHICAGO 80 19 8 USAMASSACHUSETTS Institute of Technology 14 13 17 16 11 11 7 9 USACOLUMBIA University 6 13 17 11 10 7 10 14 10 USAUniversity of PENNSYLVANIA 37 56 6 11 USAPRINCETON University 6 59 27 9 12 USADUKE University 27 11 9 7 10 6 9 6 13 USAJOHNS HOPKINS University 20 10 9 9 7 10 6 6 7 6 13 USACORNELL University 6 24 11 7 6 7 9 9 7 15 USAAUSTRALIAN National University 10 30 29 31 16 AustraliaSTANFORD University 10 14 7 10 9 10 6 6 7 17 USAUniversity of MICHIGAN 6 27 17 9 10 7 14 6 18 USAUniversity of TOKYO 16 7 13 7 6 6 19 JapanMCGILL University 7 19 41 13 9 7 20 Canada

Simulated rank range - THES 2008

Uncertainty analysis – THES

37

1

51

101

151

201

251

301

351

401

451

501Med

ian

rank

(and

99%

con

fiden

ce in

terv

al) a

ccou

ntin

g fo

r m

etho

dolo

gica

l unc

erta

intie

s

Seoul National UniversityUniversity of Frankfurt

University of Hamburg

University of California-Davis

University of Alaska-Fairbanks

Hanyang University

54 universities outside the interval (total of 503) [43 universities in the Top 100]

Uncertainty analysis – ARWU results

38

•

1

51

101

151

201

251

301

351

401

Med

ian

rank

(and

99%

con

fiden

ce in

terv

al) a

ccou

ntin

g fo

r m

etho

dolo

gica

l unc

erta

intie

s

250 universities outside the interval (total of 400) [61 universities in the Top 100]

University of California, Santa Barbara

Stockholm School of Economics

University of st. Gallen

University of Tokyo

University of Leichester

University La Sapienza, Roma

Uncertainty analysis – THES results

39

1. HEI provide an array of services and positive externalities to society (universal education, innovation and growth, active citizens, capable entrepreneurs and administrators, etc.) which call for multi-dimensional measures of effectiveness and/or efficiency.

2. A clear statement of the purpose of any such measure is also needed, as measuring scientific excellence is not the same as measuring e.g. employability or innovation potential, or where to study, or how to reform the university system so as to increase the visibility of national universities.

Policy implications

40

3. Indicators and league tables are enough to start a discussion on higher education issues BUT not sufficient to conclude it.

4. Assigned university rank largely depends on the methodological assumptions made in compiling the rankings.

• 9 in 10 universities shift over 10 positions in the 2008 SJTU.

• 92 positions (Univ Autonoma Madrid) and 277 positions (Univ Zaragoza) in Spain,

• 71 positions (Univ Milan) and 321 positions (Polytechnic Inst Milan) in Italy,

• 22 positions (Univ Paris 06) and 386 positions (Univ Nancy 1) in France.

Policy implications

41

5. A multi-modeling approach can offer a representative picture of the classification of universities by ranking institutions in a range bracket, as opposed to assigning a specific rank which is not representative of the plurality of opinions on how to assess university performance.

6. The compilation of university rankings should always be accompanied by coherence tests & robustness analysis.

Policy implications

42

• ‘rankings are here to stay, and it is therefore worth the time and effort to get them right’

(Alan Gilbert, Nature News, 2007)

• ‘because they define what “world-class” is to the broadest audience, these measures cannot be ignored by anyone interested in measuring the performance of tertiary education institutions’

(Jamil Salmi, 2009)

Conclusions

43

• ‘rankings are here to stay’ (Sanoff, 1998)• ‘ranking systems are clearly here to stay’ (Merisotis,

2002)• ‘tables: they may be flawed but they are here to

stay’ (Leach, 2004)• ‘they are here to stay’ (Hazelcorn, 2007)• ‘like them or not, rankings are here to stay’ (Olds,

2010)• ‘whether or not colleagues and universities agree

with the various ranking systems and league table findings is insignificant, rankings are here to stay’ (UNESCO, 2010)

• ‘educationalists are well able to find fault with rankings on numerous grounds and may reject them outright. However, given that they are here to stay…’ (Trofallis, 2012)

• ‘while many institutions had reservations about the methodologies used by the rankings compliers, there was a growing recognition that rankings and classifications were here to stay’ (Osborne, 2013)

Conclusions

44

More at:http://composite-indicators.jrc.ec.europa.eu(or simply Google “composite indicators” – 1st hit)

45

1. Paruolo P., Saisana M., Saltelli A., 2013, Ratings and Rankings: voodoo or science?. J Royal Statistical Society A 176(2).

2. Saisana M., D’Hombres B., Saltelli A., 2011, Rickety Numbers: Volatility of university rankings and policy implications. Research Policy 40, 165–177.

3. Saisana M., D’Hombres B., 2008, Higher Education Rankings: Robustness Issues and Critical Assessment, EUR 23487, Joint Research Centre, Publications Office of the European Union, Italy.

4. Saisana M., Saltelli A., Tarantola S., 2005, Uncertainty and sensitivity analysis techniques as tools for the analysis and validation of composite indicators. J Royal Statistical Society A 168(2), 307-323.

5. OECD/JRC, 2008, Handbook on Constructing Composite Indicators. Methodology and user Guide, OECD Publishing, ISBN 978-92-64-04345-9.

References and Related Reading

Quality criteria for data aggregation used in academic rankings IREG FORUM on University rankings ...

Documents

Transcript of Quality criteria for data aggregation used in academic rankings IREG FORUM on University rankings ...