Research Evaluation: When you measure a system, you change the system

55
1 1 1 RESEARCH EVALUATION: WHEN YOU MEASURE A SYSTEM, YOU CHANGE THE SYSTEM Giorgio Sirilli IRCrES-CNR Redazione ROARS

Transcript of Research Evaluation: When you measure a system, you change the system

1 11

RESEARCH EVALUATION: WHEN YOU MEASURE A SYSTEM, YOU CHANGE THE SYSTEM

Giorgio Sirilli

IRCrES-CNR Redazione ROARS

2 22

ROARS

Start: 2011Members of the Editorial board: 14Collaborators: 250Contacts: 10,6 million (November 2011 – May 2015)Average daily contacts: 500 November 2011; 8,000 in 2014)Articles published: 2,000Comments by readers: 30,000ROARS is ranked 8° among the top cultural national blogsROARS, a genuine expression of democracy and participation, has

become a very important player in the policy debate and in policy making

3 33

Evaluation

Evaluation may be defined as an objective process aimed at the critical analysis of the relevance, efficiency, and effectiveness of policies, programmes, projects, institutions, groups and individual researchers in the pursuance of the stated objectives.

Evaluation consists of a set of coordinated activities of comparative nature, based on formalised methods and techniques through codified procedures aimed at formulating an assessment of intentional interventions with reference to their implementation and to their effectiveness.

Internal/external evaluation

4 44

The first evaluation (Genesis)

The first evaluationIn the beginning God created the heaven and the earth.And God saw everything that He had made. “Behold”, God said, “it is very

good”. And the evening and morning were the sixth day.And on the seventh day God rested from all His work. His Archangel came

then unto Him asking, “God, how do you know that what You have created is ‘very good’? What are Your criteria? On what data do You base Your judgement? Aren’t You a little close to the situation to make a fair and unbiased evaluation?”

God thought about these questions all that day and His rest was greatly disturbed.

On the eighth day, God said, “Lucifer, go to hell!”(From Halcom’s “The Real Story of Paradise Lost”)

5 55

A brief history of evaluation

Research Assessment Exercise (RAE)Research Excellence Framework (REF) (impact) “The REF will over time doubtless become more sophisticated and burdensome. In short we are creating a Frankenstein monster” (Ben Martin)

Italy, a latecomerEvaluation in Italy: yes or no?Yes, but … good evaluation

6 66

What do we evaluate?

7 77

The value of science

William Gladstone, then British Chancellor of the Exchequer (minister of finance), asked Michael Faraday of the practical value of electricity.Gladstone’s only commentary was ‘but, after all, what use is it?”“Why, sir, there is every probability that you will soon be able to tax it.”

Michael Faraday William Gladstone

8 88

The case of physicists

Bruno Maksimovič Pontekorvo

9 99

The case of physicists

“Physics is a single discipline but unfortunately nowadays phisicists belong to two differents groups: the theoreticians and the experimentalists. If a thoretician does not posses an extraordinary ability his work does not make sense ….For experimentalists also ordinary peole can do a useful work …”(Enrico Fermi, 1931)

“La fisica è una sola ma disgraziatamente oggi i fisici sono divisi in due categorie: i teorici e gli sperimentatori. Se un teorico non possiede straordinarie capacità il suo lavoro non ha senso…Per quanto riguarda la sperimentazione invece anche una persona di medie capacità ha la possibilità di svolgere un lavoro utile.”

10 1010

The case of graphene

Graphene is an allotrope of carbon in the form of a two-dimensional, atomic-scale, hexagonal lattice.Graphene has many extraordinary properties. It is about 100 times stronger than steel by weight, conducts heat and electricity with great efficiency and is nearly transparent.Scientists it was first measurably produced and isolated in the lab in 2003.Andre Geim and Konstantin Novoselov at the University of Manchester won the Nobel Prize in Physics in 2010 "for groundbreaking experiments regarding graphene."The global market for graphene is reported to have reached $9 million by 2014 with most sales in the semiconductor, electronics, battery energy and composites industries.

11 1111

The famous paper by Andre Geim and Konstantin Novoselov was published in 2004 and in 2007 it was indeed quite famous and cited.

The point is whether the committee would have selected his project and awarded him with an ERC Starting Grant in 2004. By looking at his citations and publications records in 2004 it is very un-probable that he would have been considered among the top 10%.

The case of graphene

12 1212

2004

2004

The case of graphene

13 1313

The knowledge bundle

14 1414

The knowledge institutions

University

teaching

research

“third mission”

Research agencies

research

problem solving

management

15 1515

The neo-conservative wave of the 1980s

16 1616

The new catchwords

New public managementValue for moneyAccountabilityRelevanceExcellence

17 1717

The neo-conservative wave in Italy

Letizia MorattiItalian minister of education and research

“You first show that you use efficiently and effectively the public money, then we will open the strings of the purse” Never happened!

18 1818

Model of firm’s management based on the principles of competitiveness and customer satisfaction (the market)

The catchwords:competitivenessexcellencemeritocracy

“Evaluative state” as the “minimum state” in which the government gives up the role of political responsibility and avoid the democratic debate in search of consensus, and rests on the “automatic pilot” of techno-administrative control.

Contro l’ideologia della valutazione. L’ANVUR e l’arte della rottamazione dell’università

19 1919

Contro l’ideologia della valutazione. L’ANVUR e l’arte della rottamazione dell’università

“ANVUR is much more than an administrative branch. It is the outcome of a cultural and political project aimed at reducing the range of alternatives and hampering pluralism.”

Sergio Benedetto

20 2020

Changes in university life

The university has become at the mercy of:

- increasing bibliometric measurement- quality standards- blind refereeing (someone sees you but you do not see him)- bibliometric medians- journal classifications (A, B, C, …)- opportunistic citing- academic tourism- administrative burden- …….

21 2121

Interview of Italian researchers (40-65 years old)

Main results:

A drastic change of researchers’ attitude due to the introduction of bibliometrics-based evaluation

The bibliometrics-based evaluation has an extremely strong normative function on scientific practices, which deeply impact the epistemic status of the disciplines

The epistemic consequences of bibliometrics-based evaluation

(T. Castellani, E. Pontecorvo, A. Valente, Epistemological consequences of bibliometrics: Insights from the scientific community, Social Epistemology Review and Reply Collective vol. 3 no. 11, 2014).

22 2222

Results1. The bibliometrics-based evaluation criteria changed the way in which scientists choose the topic of their research:-choosing a fashionable theme-placing the article in the tail of an important discovery (bandwagon effect)-choosing short empirical papers 2. The hurry3. Interdisciplinary topics are hindered. Bibliometric evaluative systems encourage researchers not to change topic during their career4. repetition of experiments is discouraged. Only new results are considered interesting(T. Castellani, E. Pontecorvo, A. Valente, Epistemological consequences of bibliometrics: Insights from the scientific community, Social Epistemology Review and Reply Collective vol. 3 no. 11, 2014).

The epistemic consequences of bibliometrics-based evaluation

23 2323

Excellence

CNR Statute 2011

CNR Statute 2015

24 2424

Research evaluation

Indicators used

- bibliometrics- R&D- peer review- students- graduates- patents- spin-offs- contracts and other funding- other

25 2525

Some indicators

Number of publicationsNumber of citationsImpact factorh-index

26 2626

Use of publications for decision making

The case of China (SCI)The case of Russia

27 2727

The h-index (Jorge Eduardo Hirsch)

In 2005, the physicist Jorge Hirsch suggested a new index to measure the broad impact of an individual scientist’s work, the h-index .A scientist has index h if h of his or her Np papers have at least h citations each and the other (Np − h) papers have ≤ h citations each.In plain terms, a researcher has an h-index of 20 if he or she has published 20 articles receiving at least 20 citations each.

28 2828

Impact factor (Eugene Fardfield)

The impact factor (IF) of an academic journal is a measure reflecting the average number of citations to recent articles published in that journal. It is frequently used as a proxy for the relative importance of a journal within its field. In any given year, the impact factor of a journal is the average number of citations received per paper published in that journal during the two preceding years. For example, if a journal has an impact factor of 3 in 2008, then its papers published in 2006 and 2007 received 3 citations each on average in 2008. ("Citable items" for this calculation are usually articles, reviews, proceedings, or notes; not editorials or letters to the editor).

29 2929

Nobel laureates and bibliometrics (Boson in 2013)

Peter Ware Higgs13 works, mostly in “minor” journal, h-index = 6

Francois Englert89 works, both in prestigious and minor journals, h-index = 10

W. S. Boyleh-index = 7

G. E. Smith h-index = 5

C. K. Kaoh-index = 1

T. Maskawah-index = 1

Y. Nambyh-index = 17

30 3030

Science and ideology: the impact on citations

0

500

1,000

1,500

2,000

2,500

3,000

CITATION YEAR

NR

CIT

ES

MARX

LENINFall of the Berlin wall

Berlin Nov. 1989

31 3131

San Francisco Declaration on Research AssessmentThe Journal Impact Factor, as calculated by Thomson Reuters, was

originally created as a tool to help librarians identify journals to purchase, not as a measure of the scientific quality of research in an article.

With that in mind, it is critical to understand that the Journal Impact Factor has a number of well-documented deficiencies as a tool for research assessment. These limitations include:

A) citation distributions within journals are highly skewed; B) the properties of the Journal Impact Factor are field-specific: it is a

composite of multiple, highly diverse article types, including primary research papers and reviews;

C) Journal Impact Factors can be manipulated (or “gamed”) by editorial policy; and

D) data used to calculate the Journal Impact Factors are neither transparent nor openly available to the public.

32 3232

San Francisco Declaration on Research Assessment

General RecommendationDo not use journal-based metrics, such as Journal Impact Factors, as a surrogate measure of the quality of individual research articles, to assess an individual scientist’s contributions, or in hiring, promotion, or funding decisions.

San Francisco Declaration on Research Assessment

33 3333

The Leiden manifesto on bibliometrics

34 3434

The Leiden Manifesto

Bibliometrics: The Leiden Manifesto for research metrics

“Data are increasingly used to govern science. Research evaluations that were once bespoke and performed by peers are now routine and reliant on metrics. The problem is that evaluation is now led by the data rather than by judgement. Metrics have proliferated: usually well intentioned, not always well informed, often ill applied. We risk damaging the system with the very tools designed to improve it, as evaluation is increasingly implemented by organizations without knowledge of, or advice on, good practice and interpretation.”

35 3535

The Leiden Manifesto – Ten principles

1) Quantitative evaluation should support qualitative, expert assessment. 2) Measure performance against the research missions of the institution, group or researcher. 3) Protect excellence in locally relevant research. 4) Keep data collection and analytical processes open, transparent and simple. 5) Allow those evaluated to verify data and analysis.

36 3636

6) Account for variation by field in publication and citation practices. 7) Base assessment of individual researchers on a qualitative judgment of their portfolio. 8) Avoid misplaced concreteness and false precision. 9) Recognize the systemic effects of assessment and indicators. 10) Scrutinize indicators regularly and update them.

The Leiden Manifesto – Ten principles

37 3737

Ranking universities and research agencies

----CNR

Fraunhofer

CNRS

---- ---- ----

38 3838

Ranking universities and research agencies

Evaluating, difficult and even dangerous ….

39 3939

Ranking of universities

Four major sources of ranking

ARWU Shangai (Shangai, Jiao Tong University)QS World University Ranking THE University Ranking (Times Higher Education)US News e World Reports (Best Global Universities)

Criteria selected as the key pillars of what makes a world class university:•Research•Teaching•Employability•Internationalisation •Facilities •Social Responsibility•Innovation•Arts & Culture •Inclusiveness•Specialist Criteria

TopUNIVERSITIES Worldwide university rankings, guides & events

41 4141

Global rankings cover less than 3-5% of the world universities

Performance

Top20

Top500

Next 500

Num

ber o

f uni

vers

ities

Other 16,500universities

42 4242

Ranking of universities: the case of Italy

ARWU Shangai (Shangai, Jiao Tong University)QS World University Ranking THE University Ranking (Times Higher Education)US News e World Reports (Best Global Universities)

ARWU Shangai: Bologna 173,, Milano 186, Padova 188, Pisa 190, Sapienza 191QS World University Ranking: Bologna 182,, Sapienza 202, Politecnico Milano 229World University Ranking SA: Sapienza 95, Bologna 99, Pisa 184, Milano 193US News e World Report: Sapienza 139, Bologna 146, Padova 146, Milano 155

43 4343

The rank-ism (De Nicolao)

44 4444

The rank-ism (De Nicolao)

The vice-rector of the univerisity of Pavia declared that “There are various rankings in the world: in each of them the University of Pavia ranks in the firts 1%.But it is not true. According to three agencies Pavia is in the following positions:371: QS World University Rankings251-275: Times Higher Education401-500: Shanghai Ranking (ARWU)

Pavia

45 4545

Evaluation is an expensive exercise

Rule of thumb: less than 1% of R&D budget devoted to its evaluation

Evaluation of the Quality of Research (VQR) 300 million Euro (ROARS)182 million Euro (Geuna)

Research Assessment Exercise (RAE)540 million Euro

Research Excellence Framework (REF)1 milllion Pounds (500 million)

46 4646

Evaluation is an expensive exercise

National Scientific Habilitation: 126 million Euro- Cost per application: 2,300 euro- Cost per job assigned: 32,000 euro

47 4747

Cost of evaluation: the saturation effect

Source: Geuna and Martin

48 4848

Source: Geuna and Martin

Cost of evaluation: a systematic loss

49 4949

Evaluation of the Quality of Research by ANVUR

Researchers’ products to be evaluated- journal articles- books and book chapters- patents- designs, exhibitions, software, manufactured items, prototypes, etc.

University teachers: 3 “products” over the period 2004-2010Public Research Agencies researchers: 6 “products” over the period 2004-2010Scores: from 1 (excellent) to -1 (missing)

50 5050

Attention basically here!

Evaluation of the Quality of Research by ANVUR

Indicators linked to research:quality (0,5)ability to attract resources (0,1)mobility (0,1)internazionationalisation (0,1)high level education (0,1)own resources (0,05)improvement (0,05)

51 5151

Evaluation of the Quality of Research by ANVUR

Indicators of the “third mission” :fund raising (0,2)patents (0,1)spin-offs (0,1)incubators (0,1)consortia (0,1)archaeological sites (0,1)museums (0,1)other activities (0,2)

52 5252

Call for Papers for Philosophy and Technology’s special issue: Toward a Philosophy of Impact

There was a time when serendipity played a central role in knowledge policy. Scientific advancement was viewed as essential for social progress, but this was paired with the assumption that it was generally impossible to steer research directly toward desired outcomes. Attempts to guide the course of research or predict its societal impacts were seen as impeding the advancement of science and thus of social welfare. Driven in part by budgetary constraints, and in part by ideology, the age of serendipity is being eclipsed by the age of accountability. Society increasingly requires academics to give an account of the value of their research. The ‘audit culture’ now permeates the university from STEM (science, technology, engineering, and math) through HASS (humanities, arts, and social sciences). Academics are being asked to consider not just how their work influences their disciplines, but also other disciplines and society more generally.

53 5353

A warning

“Science today is riven with perverse incentives:Researchers judge one another not by the quality of their science — who has time to read all that? — but by the pedigree of their journal publications.High-profile journals pursue flashy results, many of which won’t panout on further scrutiny.Universities reward researchers on those publication records.Financing agencies, reliant on peer review, direct their grant money back toward those same winners.Graduate students, dependent on their advisers and neglected by their universities, receive minimal, ad hoc training on proper experimental design, believing the system of rewards is how it always has been and how it always will be.”

The Cronicle of Higher Education (March 16, 2015) Amid a Sea of False Findings, the NIH Tries Reform - By Paul Voosen

54 5454

Lessons from Research Evaluation

Evaluation in Italy is going to stayThe system has been measured and has changedAwareness of the limitations of metricsThe challenge: avoid that evaluation becomes a Frankenstein monsterMain problems:

League tablesCompetition vs cooperation of scientistsPeer review vs bibliometricsNSE vs SSHOpportunistic behaviourThe split of the academic community (the good and the bad

guys)The equilibrium amongst the teaching, research and third

missionBureacratisation

The use of evaluation for polict purposes

55 5555

Research Evaluation

Thank you for attention