Venue Analytics: A Simple Alternative to Citation ... - arXiv

16
Venue Analytics: A Simple Alternative to Citation-Based Metrics Leonid Keselman Carnegie Mellon University Pittsburgh, PA [email protected] Figure 1: Results from our method, demonstrating differences and changes in conference quality over time in various subfields. These graphs includes size and year normalization and are the result of combining multiple different metrics of interest. ABSTRACT We present a method for automatically organizing and evaluating the quality of different publishing venues in Computer Science. Since this method only requires paper publication data as its in- put, we can demonstrate our method on a large portion of the DBLP dataset, spanning 50 years, with millions of authors and thou- sands of publishing venues. By formulating venue authorship as a regression problem and targeting metrics of interest, we obtain venue scores for every conference and journal in our dataset. The obtained scores can also provide a per-year model of conference quality, showing how fields develop and change over time. Ad- ditionally, these venue scores can be used to evaluate individual academic authors and academic institutions. We show that using venue scores to evaluate both authors and institutions produces quantitative measures that are comparable to approaches using citations or peer assessment. In contrast to many other existing evaluation metrics, our use of large-scale, openly available data enables this approach to be repeatable and transparent. To help others build upon this work, all of our code and data is available at https://github.com/leonidk/venue_scores. 1 INTRODUCTION There exist many tools to evaluate professional academic scholar- ship. For example Elseiver’s Scorpus provides many author-level and journal-level metrics to measure the impact of scholars and their work [14, 16]. Other publishers, such as the Public Library of Science, provide article-level metrics for their published work [19]. Large technology companies, such as Google and Microsoft provide their own publicly available metrics for scholarship [11]. Even inde- pendent research institutes, such as the Allen Institute’s Semantic Scholar [2], manage their own corpus and metrics for scholarly productivity. However, these author-based metrics (often derived from citation measurements) can be inconsistent, even across these large, established providers [15]. In this work, we propose a method for evaluating a comprehen- sive collection of published academic work by using an external evaluation metric. By taking a large collection of papers and using only information about their publication venue, who wrote them, and when, we provide a largely automated way of discovering not only venue’s value. Further, we also develop a system for automatic organization of venues. This is motivated by the desire for an open, reproducible, and objective metric that is not subject to some of the challenges inherent to citation-based methods [10, 15, 22]. We accomplish this by setting up a linear regression from a publication record to some metric of interest. We demonstrate three valid regression targets: status as a faculty member (a classification task), awarded grant amounts, and salaries. By using DBLP [32] as our source of publication data, NSF grants as our source of awards, University of California data for salaries, and CSRankings [6] for faculty affiliation status, we’re able to formulate these as large data regression tasks, with design matrix dimensions on the order of a million in each dimension. However, since these matrices are sparse, regression weights can be obtained efficiently on a single laptop computer. Details of our method are explained in section 4. We call our results venue scores and validate their performance in the tasks of evaluating conferences, evaluating professors, and ranking universities. We show that our venue scores correlate highly with other influence metrics, such as h-index [24], cita- tions or highly-influential citations [54]. Additionally, we show that university rankings, derived from publication records correlate highly with both established rankings [17, 37, 42] and with recently published quantitative metrics [6, 8, 12, 56]. arXiv:1904.12573v2 [cs.DL] 5 Jun 2019

Transcript of Venue Analytics: A Simple Alternative to Citation ... - arXiv

Page 1: Venue Analytics: A Simple Alternative to Citation ... - arXiv

Venue Analytics: A Simple Alternative to Citation-Based MetricsLeonid Keselman

Carnegie Mellon UniversityPittsburgh, PA

[email protected]

1970 1980 1990 2000 2010year

2

4

6

8

10

12

venu

e sc

ores

AI/ML

NIPS (10.9)ICML (9.5)AAAI (7.7)AISTATS (7.5)UAI (5.8)IJCAI (5.5)

1970 1980 1990 2000 2010year

1

2

3

4

5

6

7

venu

e sc

ores

HCI

CHI (7.2)CSCW (5.2)UIST (5.0)UbiComp (4.8)ICWSM (4.6)DIS (3.1)

1970 1980 1990 2000 2010year

1

2

3

4

5

6

venu

e sc

ores

Robotics

RSS (6.2)ICRA (5.2)WAFR (5.2)IJRR (5.1)HRI (4.1)IROS (3.8)

1970 1980 1990 2000 2010year

2

4

6

8

10

12

venu

e sc

ores

Theory

STOC (11.7)FOCS (11.0)CCC (10.4)SODA (10.2)APPROX-RANDOM (7.9)COLT (7.8)

1970 1980 1990 2000 2010year

123456789

venu

e sc

ores

Vision

CVPR (8.2)ECCV (6.4)ICCV (6.2)PAMI (5.5)IJCV (4.3)WACV (3.7)

1970 1980 1990 2000 2010year

2

4

6

8

10

venu

e sc

ores

Database

VLDB (7.8)SIGMOD (7.6)ICDE (6.4)PODS (5.9)CIDR (5.6)TDS (5.1)

1970 1980 1990 2000 2010year

1

2

3

4

5

6

7

venu

e sc

ores

IR/Web

WWW (6.8)KDD (6.2)CIKM (4.7)ICWSM (4.6)ICDM (4.5)SIGIR (4.2)

1970 1980 1990 2000 2010year

1

2

3

4

5

6ve

nue

scor

es

NLP

EMNLP (6.6)ACL (5.9)NAACL (5.3)TACL (4.8)COLING (3.3)INTERSPEECH (1.7)

1970 1980 1990 2000 2010year

12345678

venu

e sc

ores

Graphics

SIGGRAPH (7.6)SIGGRAPH Asia (7.4)TOG (5.5)TVCG (5.0)CGF (4.9)SCA (4.7)

1970 1980 1990 2000 2010year

123456789

venu

e sc

ores

Crypto

TCC (9.1)CRYPTO (7.2)EUROCRYPT (7.2)J. Cryptology (4.9)ASIACRYPT (4.0)PKC (3.8)

Figure 1: Results fromourmethod, demonstrating differences and changes in conference quality over time in various subfields.These graphs includes size and year normalization and are the result of combining multiple different metrics of interest.

ABSTRACTWe present a method for automatically organizing and evaluatingthe quality of different publishing venues in Computer Science.Since this method only requires paper publication data as its in-put, we can demonstrate our method on a large portion of theDBLP dataset, spanning 50 years, with millions of authors and thou-sands of publishing venues. By formulating venue authorship asa regression problem and targeting metrics of interest, we obtainvenue scores for every conference and journal in our dataset. Theobtained scores can also provide a per-year model of conferencequality, showing how fields develop and change over time. Ad-ditionally, these venue scores can be used to evaluate individualacademic authors and academic institutions. We show that usingvenue scores to evaluate both authors and institutions producesquantitative measures that are comparable to approaches usingcitations or peer assessment. In contrast to many other existingevaluation metrics, our use of large-scale, openly available dataenables this approach to be repeatable and transparent.

To help others build upon this work, all of our code and data isavailable at https://github.com/leonidk/venue_scores.

1 INTRODUCTIONThere exist many tools to evaluate professional academic scholar-ship. For example Elseiver’s Scorpus provides many author-leveland journal-level metrics to measure the impact of scholars andtheir work [14, 16]. Other publishers, such as the Public Library ofScience, provide article-level metrics for their published work [19].Large technology companies, such as Google and Microsoft providetheir own publicly available metrics for scholarship [11]. Even inde-pendent research institutes, such as the Allen Institute’s SemanticScholar [2], manage their own corpus and metrics for scholarlyproductivity. However, these author-based metrics (often derived

from citation measurements) can be inconsistent, even across theselarge, established providers [15].

In this work, we propose a method for evaluating a comprehen-sive collection of published academic work by using an externalevaluation metric. By taking a large collection of papers and usingonly information about their publication venue, who wrote them,and when, we provide a largely automated way of discovering notonly venue’s value. Further, we also develop a system for automaticorganization of venues. This is motivated by the desire for an open,reproducible, and objective metric that is not subject to some of thechallenges inherent to citation-based methods [10, 15, 22].

We accomplish this by setting up a linear regression from apublication record to some metric of interest. We demonstrate threevalid regression targets: status as a faculty member (a classificationtask), awarded grant amounts, and salaries. By using DBLP [32] asour source of publication data, NSF grants as our source of awards,University of California data for salaries, and CSRankings [6] forfaculty affiliation status, we’re able to formulate these as large dataregression tasks, with design matrix dimensions on the order ofa million in each dimension. However, since these matrices aresparse, regression weights can be obtained efficiently on a singlelaptop computer. Details of our method are explained in section 4.

We call our results venue scores and validate their performancein the tasks of evaluating conferences, evaluating professors, andranking universities. We show that our venue scores correlatehighly with other influence metrics, such as h-index [24], cita-tions or highly-influential citations [54]. Additionally, we showthat university rankings, derived from publication records correlatehighly with both established rankings [17, 37, 42] and with recentlypublished quantitative metrics [6, 8, 12, 56].

arX

iv:1

904.

1257

3v2

[cs

.DL

] 5

Jun

201

9

Page 2: Venue Analytics: A Simple Alternative to Citation ... - arXiv

2 RELATEDWORKQuantitative measures of academic productivity tend to focus onmethods derived from citation counts. By using citation count as theprimary method of scoring a paper, one can decouple an individualarticle from the authors whowrote it and the venue it was publishedin. Then, robust citation count statistics, such as h-index [24], canbe used as a method of scoring either individual authors, or aspecific venue. Specific critiques of h-index scores arose almost assoon as the h-index was published, ranging from a claimed lack-of-utility [31], to a loss of discriminatory power [53].

Citations can also be automatically analyzed for whether or notthey’re highly influential to the citing paper, producing a "influen-tial citations" metric used by Semantic Scholar [54]. Even further,techniques from graph and network analysis can be used to under-stand systematic relationships in the citation graph [7]. Citation-based metrics can even be used to provide a ranking of differentuniversities [8].

Citations-based metrics, despite their wide deployment in thescientometrics community, have several problems. For one, cita-tion behavior varies widely by fields [10]. Additionally, citationsoften exhibits non-trivial temporal behavior [22], which also variesgreatly by sub-field. These issues highly affect one’s ability to com-pare across disciplines and produce different scores at differenttimes. Recent work suggests that citation-based metrics struggleto effectively capture a venue’s quality with a single number [57].Comparing citation counts with statistical significance requires anorder-of-magnitude difference in the citation counts [30], which lim-its their utility in making fine-grained choices. Despite these qualityissues, recent work [56] has demonstrated that citation-based met-rics can be used to build a university ranking that correlates highlywith peer assessment; we show that our method provides a similarquality of correlation.

Our use of straightforward publication data (Section 3) enablesa much simpler model. This simplicity is key, as the challenges inmaintaining good citation data have resulted in the major sourcesof h-index scores being inconsistent with one another [15].

While there exist many forms of ranking journals such as Eigen-factor [7] or SJR [18], these tend to focus on journal-level metricswhile our work focuses on all venues, including conferences.

2.1 Venue MetricsWe are not the first to propose that scholars and institutions canbe ranked by assigning scores to published papers[44]. However,in prior work the list of venues is often manually curated andassigned equal credit, a trend that is true for studies in 1990s [23]and their modern online versions [6]. Instead, we propose a methodfor obtaining automatic scores for each venue, and in doing so,requires no manual curation of valid venues.

Previous work [58] has developed methods for ranking venuesautomatically, generating unique scores based only on author databy labeling and propagating notions of "good" papers and author-itative authors. However, this work required a manually curatedseed of what good work is. It was only demonstrated to work ona small sub-field of conferences, as new publication cliques wouldrequire new labeling of "good" papers. Recent developments in

network-based techniques for ranking venues have included cita-tion information [60], and are able to produce temporal models ofquality. In contrast, our proposed model doesn’t require citationdata to produce sensible venue scores.

Our work, in some ways, is most similar to that of CSRank-ings [6]. CSRankings maintains a highly curated set of top-tiervenues; venues selected for inclusion are given 1 point per paper,while excluded venues are given 0 points. Additionally, universityrankings produced by CSRankings include a manually curated setof categories, and rankings are produced via a geometric meanover these categories. In comparison, in this work, we produceunique scores for every venue and simply sum together scores forevaluating authors and institutions.

In one of the formulations of our method, we use authors statusas faculty (or not faculty) to generate our venue scores. We arenot alone in this line of analysis, as recent work has demonstratedthat faculty hiring information can be used to generate universityprestige rankings [12].

Many existing approaches either focus only on journals [18], ordo not have their rankings available online. For our dataset, we donot have citation-level data available, so we are unable to compareagainst certain existing methods on our dataset. However, as thesemethods often deploy a variant of PageRank [39], we describe aPageRank baseline in Section 6.1 and report its results.

3 DATAOur primary data is the dblp computer science bibliography [32].DBLP contains millions of articles, with millions of authors acrossthousands of venues in Computer Science. We produced the resultsin this paper by using the dblp-2019-01-01 snapshot. We restrictedourselves to only consider conference and journal publications,skipping books preprints, articles below 6 pages and over 100 pages.We also merged dblp entries corresponding to the same confer-ence. This lead to a dataset featuring 2,965,464 papers, written by1,766,675 authors, across 11,255 uniquely named venues and 50years of publications (1970 through 2019).

Our first metric of interest is an individual’s status as a facultymember at a university. For this, we used the faculty affiliationdata from CSRankings [6], which are manually curated and containhundreds of universities across the world and about 15,000 profes-sors. For evaluation against other university rankings, we used theScholarRank [56] data to obtain faculty affiliation, which containsa more complete survey of American universities (including morethan 50 not currently included in CS Rankings). While CSRank-ings data is curated to have correct DBLP names for faculty, theScholarRank data does not. To obtain a valid affiliation, the nameswere automatically aligned with fuzzy string matching, resultingin about 4,000 faculty with good seemingly unique DBLP namesand correct university affiliation. Although those two methods aremanually curated, automatic surveys of faculty affiliations haverecently been demonstrated [36].

Our second metric of interest was National Science Foundationgrants, where we used awards from 1970 until 2018. This data isavailable directly from the NSF [20]. We adjusted award amountsusing annual CPI inflation data. We restricted ourselves to awardsthat had finite amount, where we could match at least half the

2

Page 3: Venue Analytics: A Simple Alternative to Citation ... - arXiv

Principal Investigators on the grant to DBLP names and the grantwas above $20,000. Award amounts over 10 million dollars wereclipped in a smooth way to avoidmatching to a few extreme outliers.This resulted in 407,012 NSF grants used in building our model.

Our third and final metric of interest was University of Califor-nia salary data [28]. This was inspired by a paper that predictedACM/IEEE Fellowships for 87 professors and used salary data [38].We looked at professors across the entire University of Californiasystem, matching their names to DBLP entries in an automated way.We used the maximum salary amount for a given individual acrossthe 2015, 2016 and 2017 datasets, skipping individuals making lessthan 120, 000 or over 800, 000 dollars. This resulted in 2,436 individ-uals, down from 3,102 names that we matched and an initial set ofabout 20,000 initial professors. As DBLP contains some Chemistry,Biology, and Economics venues, we expect that some of these arelikely not Computer Science professors.

Table 1: Spearman’s ρ correlation between rankings pro-duced by targeting different metrics of interest.

Faculty NSF Salary

Faculty 1.00 0.91 0.84NSF 0.91 1.00 0.86Salary 0.84 0.86 1.00

4 METHODOur basic model is that a paper in a given publication venue (eithera conference or a journal), having passed peer review, is wortha certain amount of value. Certain venues are more prestigious,impactful or selective, and thus should have higher scores. Othervenues have less strict standards, or perhaps provide less opportu-nity to disseminate their ideas [35] and should be worth less. Whilethis model explicitly ignores the differences in paper quality at apublication venue, discarding this information enables the use of alarge quantity of data to develop a statistical scoring system.

This methodology is not valid for all fields of science, nor allmodels of how impactful ideas are developed and disseminated.Our method requires individual authors have multiple publicationsacross many different venues, which is more true in ComputerScience than the natural sciences or humanities, where publishingrates are lower [33]. If instead we assume that all research ideasproduce only a single paper, or that passing peer-review is a noisymeasurement of quality [50], then our proposed method would notwork very well. Instead, the underlying process which supports ourmethodology is that good research ideas produce multiple researchpublications in selective venues; better ideas would produce moreindividual publications in higher quality venues. The concept ofAll models are wrong, but some are useful is our guide here. Thisassumption allows us to obtain venue scores in an automatic way,and then use these scores to evaluate both authors and institutions.

Our approach to ranking venues is to construct a regression task,apply an optimization method to solve it, and use the resultingweights. Optimization’s ability to generate powerful representa-tions is well-studied in machine learning [47] and statistics [27].

The resulting weights are often useful [13], and this methodologyis widely used in natural language processing [34, 41].

4.1 Formal SetupIn general, we will obtain a score for each venue by setting up alinear regression task in the form of equation 1.

©­­­­«

conf1 conf2 . . . confnauth1 1 3 . . . 0 1auth2 1 0 . . . 1 1...

......

. . ....

...

authm 0 2 . . . 4 1

ª®®®®¬©­­­­«x0x1...

xn

ª®®®®¬=

©­­­­«isProf1isProf2...

isProfm

ª®®®®¬(1)

Authors are listed along the rows, and venues are listed alongthe columns. The number of publications that an author has in aconference is noted in the design matrix. There is an additionalcolumn of 1s to learn a bias offset in the regression.

Different forms of counting author credit are discussed in sec-tion 4.6, while different regression targets are discussed in section 3.In equation 1, the regression target is shown as a binary variableindicating whether or not that author is currently a professor. Ifthis linear system, Ax = b is solved, then the vector x will con-tain real-valued scores for every single publishing venue. Since oursystem is over-determined, there is generally no exact solution.

Instead of solving this sparse linear system directly, we insteadsolve a regularized regression, using a robust loss and L2 regular-ization. That is, we iteratively minimize the following expressionvia stochastic gradient descent [45]

L(Ax ,b) + λ | |x | |2 (2)

The L2 regularization enforces a Gaussian prior on the learnedconference scores. We can perform this minimization in Pythonusing common machine learning software packages [40]. We tendto use a robust loss function, such as the Huber loss in the case ofregression [26], which is quadratic for errors of less than δ andlinear for errors of larger than δ . It can be written as

L(y,y) ={12 (y − y)2, if |y − y | ≤ δ

δ |y − y | − 12δ

2, otherwise(3)

In the case of classification, we have labels y ∈ {−1, 1} and use themodified Huber loss [61],

L(y,y) ={max(0, 1 − yy)2, if yy ≥ −1−4yy, otherwise

(4)

We experimented with other loss functions, such as the logisticloss, and while they tended to produce similar rankings and results,we found that the modified Huber loss provided better empiricalperformance in our test metrics, even though the resulting curveslooked very similar upon qualitative inspection.

3

Page 4: Venue Analytics: A Simple Alternative to Citation ... - arXiv

4.2 Metrics of InterestAs detailed in section 3, we targeted three metrics of interest: statusas a faculty member, NSF award sizes, and professor salaries. Eachof these metrics came from a completely independent data source,and we found that they each had their own biases and strengths(more in section 5).

For faculty status classification, we used the modified huberloss and CSRankings [6] faculty affiliations. To build venue scoresthat reward top-tier conferences more highly, we only gave pro-fessors in the top-k ranked universities positive labels. We triedk = 5, 16, 40, 80, and found that we got qualitatively different resultswith quantitative similar performance. Unless otherwise stated,we used k = 40. The university ranking used to select top-k wasCSRankings itself, and included international universities, coveringthe Americas, Europe and Asia. This classification is performedacross all authors, leading to 1.7 million rows in our design matrix.

For the NSF awards, every Principal Investigator on the awardhad their papers up to the award year as the input features. Weused a Huber loss, λ = 0.03, and experimented with normalizingour award data to have zero mean and unit variance. Additionally,we built models for both raw award sizes and log of award sizes; theraw award sizes seem to follow a power-law while the log awardsizes seem distributed as approximately a Gaussian. Another modelchoice is whether to regress each NSF grant as an independentmeasurement or instead a marginal measurement which tracks thecumulative total of NSF grants received by the authors. If not allauthors on the grant were matched to DBLP names, we only usedthe fraction of the award corresponding to the fraction of identifiedauthors. This regression had ∼ 1

2 million rows in its design matrix .For the salary data, we found that normalizing the salary data

to have zero mean and unit variance led to a very poor regres-sion result, while having no normalization produced a good result.This regression only had ∼ 2, 400 datapoints, and thus providedinformation about fewer venues than the other metrics of interest.

4.3 Modeling Change Over TimeIn modeling conference values, we wanted to build a model thatcould adjust for different values in different years. For example,a venue may be considered excellent in the 1980s, but may havedeclined in influence and prestige since then. To account for thisbehavior, we break our dataset into chunks of n years and create adifferent regression variable for each conference for each chunk.The non-temporal model is obtained simply by setting n ≥ 50.

We also examine a model that creates an independent variablefor each year, for each conference. After setting n = 1 in the blockmodel, we splat each publication as a Gaussian at a given year.By modifying σ , we can control the smoothness of the obtainedweights. The Gaussian is applied via a sparse matrix multiply of thedesign matrix A with the appropriate band diagonal sparse matrixG. The use of a truncated Gaussian (where p < 0.05 is clippedand the Gaussian is re-normalized) enables our matrix to maintainsparsity. We used a σ = 4.5, which produced an effective windowsize of about 10 years. This can be seen visually in Figure 2.

Our different temporal models are compared in Table 5 by corre-lating against existing author-level, journal-level and university-level metrics. For evaluation details see Section 6.

1970 1980 1990 2000 2010 20200.0

0.1

0.2

0.3

0.4

0.5

1970 1980 1990 2000 2010 20200.0

0.1

0.2

0.3

0.4

0.5

Figure 2: Truncated Gaussian (σ = 4.5) used to splat a pub-lication’s value across multiple years. Examples centered atthe year 2000 and the year 2018

1970 1980 1990 2000 2010year

0

5

10

15

20

25

venu

e sc

ores

(in

stan

dard

dev

iatio

ns)

None

NIPSAAAIICMLIJCAIAISTATSUAI

1970 1980 1990 2000 2010year

0

2

4

6

8

10

venu

e sc

ores

(in

stan

dard

dev

iatio

ns)

Size

NIPSAAAIICMLIJCAIAISTATSUAI

1970 1980 1990 2000 2010year

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

venu

e sc

ores

(in

stan

dard

dev

iatio

ns)

Year

NIPSAAAIICMLIJCAIAISTATSUAI

1970 1980 1990 2000 2010year

0

2

4

6

8

venu

e sc

ores

(in

stan

dard

dev

iatio

ns)

Year + Size

NIPSAAAIICMLIJCAIAISTATSUAI

Normalization Methods

Figure 3: Results showing the effect of performing a normal-ization for venue year and size. See Sections 4.4 and 4.5.

4.4 Normalizing Differences Across YearsThe temporal models described in the previous section have aninherent bias. Due to the temporal window of the DBLP publica-tion history, there is variation in distributed value due to changesannual NSF funding, the survivorship of current academic faculty,etc. To adjust for this bias, we scale each conference-year valueby the standard deviation of conference values in that year. Thisscaling can help or hurt performance, depending on which metricof interest the model is built against. It generally produces flattervalue scores over time but leads to some artifacts. The effects of thisnormalization are shown in Figure 3 and Table 2. In our final ver-sion, we instead normalize by the average of the top 10 conferencesin each year, which produced flatter results over time,

4

Page 5: Venue Analytics: A Simple Alternative to Citation ... - arXiv

Table 2: Spearman correlation between our model and exist-ing metrics, showing the effect of different normalizationschemes. See Sections 4.4 and 4.5 for model details. See Sec-tion 6 for evaluation details.

Normalization influentialcitations(author)

h-index(author)

h-index(university)

h-index(venue)

None 0.72 0.61 0.60 0.26Year 0.72 0.66 0.58 0.18Size 0.74 0.63 0.61 0.25Year + Size 0.72 0.61 0.60 0.26

4.5 Normalizing Differences In Venue SizeOur model uses L2 regularization on the venue scores, which tendsto squash the value of variables with less explanatory power.Thisprocess often resulted in the under-valuing of smaller, more se-lective venues. To correct for this size bias, instead of giving eachpaper 1 point of value in the design matrix, we give each paper 1

Mcredit, whereM is the number of papers at that venue in that year;this step is performed before Gaussian splatting. This producedrankings which emphasized small venues and provided a good no-tion of efficient venues. In practice, we instead used a blended resultwith a multiplier of 1

α√Mwith α = 1.5849, the Hausdorff dimension

of the Sierpinski triangle, an arbitrary constant.

4.6 Modeling Author PositionAnother question to consider is how credit for a paper is dividedup amongst the authors. We consider four models of authorshipcredit assignment:

(1) Authors get 1n credit for each paper, where n is the number

of authors on the paper. Used by [6].(2) All authors get full credit (1 point) for each paper(3) Authors receive less credit for later positions ( 11 ,

12 ,

13 · · ·

1n ),

normalized so total credit sums to 1). This model assignsmore credit to earlier authors and is used by certain practi-tioners [25, 49].

(4) The same as (3), except the last author is explicitly assignedequal credit with the first author before normalization.

Using Spearman correlation with Semantic Scholar’s "highly influ-ential citations", an evaluation metric described in depth in Sec. 6.4,we can evaluate each of these models. Specifically, there are twoplaces where venues scores require a selection of authorship model.The first is how much credit is assigned to each paper when per-forming regression (in the case of our faculty metric of interest). Thesecond is when evaluating authors with the obtained regressionvector. See table 3 for a summary of experimental results. For thepurposes of evaluation, assigning full credit to authors (model 2)produced the best results, while model 3 consistently produced thelowest quality correlations. On the other hand, for the purposes ofperforming the classification task, the roles are flipped. Assigningfull credit (model 2) consistently produces the worst quality correla-tions while using model 3 produces the highest quality correlations.

Table 3: Correlation (Spearman’s ρ) between our model andSemantic Scholar [54], showing the properties of differentauthorship models. For details see section 4.6.

Evaluation Author Model1 2 3 4

Regressio

nAutho

rMod

el

1 0.70 0.72 0.65 0.702 0.68 0.71 0.61 0.673 0.71 0.73 0.66 0.714 0.70 0.72 0.65 0.71

Table 4: Correlation between ourmodel and traditionalmea-sures of scholarly output on the dataset of CMU faculty. Formodel details see Section 4.7. For evaluation details see Sec-tion 6.4.

Model citations h-index [24] influential citations [54]

Faculty 0.59 0.68 0.71NSF 0.63 0.66 0.67Salary 0.36 0.36 0.41Combined 0.69 0.77 0.75

4.7 Combining ModelsSince the proposed metrics of interest (faculty status, NSF awards,salaries) were generated from different independent regressiontargets, with different sized design matrices, there may be value incombining them to produce a joint model. The value of ensemblemodels is well documented in both theory [21] and practice [4]. Inthe absence of a preferred metric with which to cross-validate ourmodel, we simply perform an unweighted average of our models toobtain a gold model. To ensure that the weights are of similar scale,the conference scores are normalized to have zero mean and unitvariance before combining them. Venue scores that are too large ortoo small are clipped at 12 standard deviations. For the temporalmodels, this normalization is performed on a per-year basis. Whiletable 4 shows results for a simple combination, one could averagetogether many models with different choices of hyperparameters,regression functions, datasets, filters to scrub the data, etc.

5 RESULTSA visual example of some of venue scores is shown in Figure 1.We kept the y-axis fixed across all the different Computer Sciencesub-disciplines to show how venue scores can be used to comparedifferent fields in a unified metric. There are additional results inour qualitative demonstration of normalization methods, Figure 3.

Due to the variation in rankings produced by one’s choice ofhyperparameters, and the large set of venues being evaluated, we donot have a canonical set of rankings that can be presented succinctlyhere. Instead, wewill focus on quantitative evaluations of our resultsin the following section.

5

Page 6: Venue Analytics: A Simple Alternative to Citation ... - arXiv

Table 5: Correlation between rankings produced by ourmodel against rankings produced by traditional scholarlymetrics. Different rows correspond to different hyperparam-eters choices for our model. Each column is a correspondsto a traditional metric. AI = Author Highly Influential Ci-tations, AH = Author H-index, USN = US News 2018, VH =Venue H-index, VC = Venue Citations.

Years Metric AI AH USN VH VC

σ = 4.5 Faculty 0.73 0.69 0.74 0.63 0.4210 0.67 0.57 0.76 0.57 0.3550 0.75 0.68 0.76 0.38 0.21

σ = 4.5 NSF 0.64 0.62 0.62 0.61 0.5910 0.68 0.68 0.60 0.59 0.6050 0.67 0.65 0.63 0.64 0.67

σ = 4.5 Salary 0.62 0.58 0.59 0.48 0.5510 0.65 0.62 0.57 0.45 0.5550 0.66 0.63 0.56 0.43 0.63

6 EVALUATIONTo validate the venue scores obtained by our regression methods,our evaluation consists of correlating our results against existingrankings andmetrics.We consider three classes of existing scholarlymeasurements to correlate against: those evaluating universities,authors, and venues. Each of these classes has different standardtechniques, and a different evaluation dataset, so they will be de-scribed separately in Sections 6.2, 6.4, and 6.3.

In the case of our proposed method, venue scores, we have asimple way to turn them from a journal-based to an author-basedor institution-based metric. Venues are evaluated directly with thescores. Authors are evaluated as the dot product of venue scoresand the publication vector of an author. Universities are evaluatedas the dot product of venue scores and the total publication vectorof all faculty affiliated with that university.

6.1 PageRank BaselineMany existing approaches build on the idea of eigenvalue central-ity [7, 58, 60].We implemented PageRank [39] using the power itera-tion method to compute a centrality measure to use for both author-level and venue-level metrics. Unlike most versions of PageRank,which use citation counts, we implement two variants based solelyon co-authorship information.

Author-level PageRank (PageRankA) is computed on the 1.7M x1.7M sized co-authorship graph, where an edge is added for everytime two authors co-author a paper. We found that the authorswith highest centrality measures are often common names withinsufficient disambiguation information in DBLP.

Journal-level PageRank (PageRankC) is computed on the 11,000x 11,000 co-authorship graph, where an edge is added for everyauthor who publishes in both venues. When ran on the unfilteredDBLP data, the highest scoring venue was arXiv, an expected result.

Table 6: Section 6.2. Kendall’s τ correlation across differentUniversity Rankings.

Ranking Correlation withUS News 2018

USN2018 [37] 1.000USN2010 [12] 0.928Venue Scores 0.780ScholarRank [56] 0.768ScholarRankFull 0.757CSMetrics [8] 0.746CSRankings [6] 0.724Times [17] 0.721NRC95 [12] 0.713t10Sum [56] 0.713Prestige [12] 0.666Citations [56] 0.665Shanghai [43] 0.586# of papers 0.585BestPaper [25] 0.559PageRankA 0.535PageRankC 0.532QS [42] 0.518

6.2 University RanksFor this work, we produce university rankings simply as an evalu-ation method to demonstrate the quality and utility of our venuescoring system. The reader is cautioned that university rankingsystems can tend to produce undesirable gaming behavior [29], andare prone to manipulation.

We obtained and aligned many existing university rankings forComputer Science departments. These include rankings curatedby journalistic sources, such as the US News Rankings [37], theQS Rankings [42], Shanghai Ranking [43], Times Higher EducationRankings [17] and the National Research Council report [12]. Inaddition, we consider purely quantitative evaluation systems suchas ScholarRank [56], CSRankings [6], CSMetrics [8], and PrestigeRankings [12].We additionally include ScholarRank’s t10sum acrossthe matched faculty that our venue scores result uses.

We follow a recent paper [56], which demonstrated the efficacyof a citation-based metric in producing rankings with large corre-lation against US News rankings. We extend these experiments toinclude more baselines. In contrast with [56], we use a rank corre-lation metric (namely Kendall’s τ ), which naturally handles ordinalranking systems. While ScholarRank [56] claimed a correlation of> 0.9 with US News, this was under Pearson’s correlation coef-ficient, and the result under Kendall’s τ is 0.768 in the publishedversion and 0.757 using full precision ScholarRank.

Our faculty-based regression is able to generate a result withthe highest correlation against the US News rankings. We performeven better than ScholarRank, which was designed to optimize thismetric (although under a non-rank correlation metric).

6

Page 7: Venue Analytics: A Simple Alternative to Citation ... - arXiv

Table 7: Spearman’s ρ correlation between different journal-level metrics (N=1008). For details see Section 6.3.

papers citations h-index PageRankC

venuescores

papers 0.75 0.41 0.91 0.25citations 0.75 0.60 0.69 0.27h-index 0.41 0.60 0.37 0.65PageRankC 0.91 0.69 0.37 0.27venue scores 0.25 0.27 0.65 0.27

6.3 Journal-level metricsTo evaluate the fidelity of our venue scores for journals and con-ferences, we obtain the h-index [24] and citation count for 1,308conferences from Microsoft Academic Graph [48]. We continueto use Spearman’s ρ as our correlation metric, even though rank-correlation metrics can be highly impacted by noisy data [1].

Under this metric, venue scores correlated highly with h-index.Notably, h-index and venue scores are highly correlated with eachother, even though venue scores are much less correlated with rawconference size. See Figure 7 for detailed results.

6.4 Author-level MetricsTo evaluate our venue scores in the application of generating author-level metrics, we will use rank correlation (also known as Spear-man’s ρ) [52] between our venue scores and traditional author-levelmetrics such as h-index. Google Scholar was used to obtain cita-tions, h-index [24], i10-index, and Semantic Scholar used to obtainhighly influential citations [54]. Prior work has critiqued the h-index measure [59] and proposed an alternative metric, derivedfrom a citation count. However, our use of a rank correlation meansthat monotonically transformed approximations of citation countswould lead to identical scores.

For evaluation, we collected a dataset for the largest ComputerScience department in CSRankings (n = 148). The results are shownin Table 8. We can see that venue scores highly correlate with h-index, influential citations, i10-scores and CSRankings scores. Theresults from the author-based PageRank are surprisingly similarto our venue scores. However, the conference-based PageRankperformed worse than venue scores on every correlation metric.

7 DISCUSSIONOur results show a medium to strong correlation of venue scoresagainst existing scholarly metrics, such as citation count and h-index. For author metrics, venue scores correlate with influentialcitations [54] or h-index about as well as such measures correlateagainst each other or raw citation counts (see Table 8). For venuemetrics, venue scores correlate with h-index (0.64) and citations(0.61) nearly as well as citations correlate with h-index (0.66). Foruniversity metrics, venue scores correlate as well with measures ofpeer assessment as citation-based metrics do [56].

As h-index and citation counts have their flaws, obtaining perfectcorrelation is not necessarily a desirable goal. Instead, these strongcorrelations serve as evidence for the viability of venue scores.

1970 1980 1990 2000 2010 2020year

0.0

2.5

5.0

7.5

10.0

12.5

15.0

annu

al v

alue

(4.5

sig

ma

smoo

thin

g)

venue scores

Judea PearlLeonidas J. Guibas

Richard M. KarpTakeo Kanade

Michael I. JordanElisa Bertino

1970 1980 1990 2000 2010 2020year

0

5

10

15

20

25

30

35

annu

al v

alue

(4.5

sig

ma

smoo

thin

g)

paper count

1970 1980 1990 2000 2010 2020year

0

2

4

6

8

annu

al v

alue

(4.5

sig

ma

smoo

thin

g)

paper count (norm by annual paper totals)

Figure 4: The career arcs of several accomplished ComputerScientists. The first row uses a simple model where all pa-pers are all given equal weight; first using raw counts andthen normalizing by the number of papers published eachyear. The second row shows our model.

Venue scores have been shown to be robust against hyperparam-eter choices (Tables 2, 3, 4, 5). Even venue scores produced fromcompletely different data sources tend to look very similar (Table 1).Additionally, venue scores can naturally capture the variation ofconference quality over time (Figures 1, 3).

As with any inductive method, venue scores are data-drivenand will be subject to past biases. For example, venue scores canclearly be biased by hiring practices, pay inequality andNSF fundingpriorities. As these are the supervisingmetrics, bias in those datasetswill be encoded in our results. For example, we found that the facultyhiring metric prioritized Theoretical Computer Science, while usingNSF awards prioritized Robotics. The faculty classification taskmay devalue publishing areas where candidates pursue industryjobs, while the NSF grant regression task may devalue areas withsmaller capital requirements. By using large datasets and combiningmultiple metrics in a single model (Section 4.7), the final modelcould reduce the biases in any individual dataset.

Each of our metrics of interest has an inherent bias in timescale,which our temporal normalization tries to correct for, but likelydoes an incomplete job of. Salaries are often higher for senior fac-ulty. NSF Awards can have a long response time and a preferencestowards established researchers. Faculty classification prioritizesthe the productive years of existing faculty. Additionally, facultyhiring as a metric will have a bias towards work from prestigiousuniversities [12] and their venue preferences. Some of these issuesalso exist in citation metrics, and may be why our uncorrectedmodels correlated better with them (Table 2).

7

Page 8: Venue Analytics: A Simple Alternative to Citation ... - arXiv

Table 8: Correlation between different author-level metrics for a dataset of professors (N=148). Details are in Section 6.4.

papers citations h-index i10 CSR [6] venue score PageRankA PageRankC influence [54]

papers 0.66 0.79 0.81 0.71 0.94 0.94 0.89 0.76citations 0.66 0.93 0.88 0.49 0.66 0.68 0.60 0.81h-index 0.79 0.93 0.97 0.56 0.75 0.81 0.68 0.80i10 0.81 0.88 0.97 0.53 0.75 0.82 0.69 0.73CSR [6] 0.71 0.49 0.56 0.53 0.84 0.64 0.80 0.64venue score 0.94 0.66 0.75 0.75 0.84 0.86 0.92 0.78PageRankA 0.94 0.68 0.81 0.82 0.64 0.86 0.83 0.72PageRankC 0.89 0.60 0.68 0.69 0.80 0.92 0.83 0.67influence [54] 0.76 0.81 0.80 0.73 0.64 0.78 0.72 0.67

8 SIMILARITY METRICSWhile the previous sections of this paper have focused on evalua-tion, the same dataset can be used to organize venues into groups.For organization, we use a much smaller dataset, using data since2005 and only evaluating the 1,155 venues that have at least 20R1 universities with faculty publishing in them. We then build thevenue × author matrix, counting the number of papers each thatauthor published in each venue. Performing a d-dimensional La-tent Dirichlet Allocation [9], we obtain a 50-dimensional vectorrepresenting each conference in a meaningful way.

These vectors can then be clustered [3] to produce automaticcategories for each conference. These high dimensional vectors canalso be embedded [55] into two dimensions to produce a visual mapof Computer Science. See Figure 5 for our result. These clustersrepresent natural categories in Computer Science. For example, it iseasy to see groups that could be called Theory, Artificial Intelligence,Machine Learning, Graphics, Vision, Parallel Computing, SoftwareEngineering, Human-Computer Interaction, among many others.

While some clusters are distinct and repeatable, others are not.When datasets contain challenging cases, ideal clustering can behard to estimate [5]. Using silhouette scores [46], we can estimatehow many natural clusters of publishing exist in Computer Science.In our experiments, silhouette scores were maximized with 20 to 28clusters. As the clustering process is stochastic, we were unable todetermine the optimal cluster number with statistical significance.

By embedding each author with the weighted average of theirpublication’s vectors, we can also obtain a fingerprint that showswhich areas of Computer Science each university focuses on. SeeFigure 6 for an example of such fingerprints for many top depart-ments. The same clustering method can be used to analyze thefocus areas of a single department, for an example see Figure 7.

9 CONCLUSIONWe have presented a method for ranking and organizing a schol-arly field based only on simple publication information- namelya list of papers, each labeled with only their published venue, au-thors, and year. By regressing venue scores from metrics of interest,one obtains a plausible set of venue scores. These scores can becompared across sub-fields and aggregated into author-level andinstitution-level metrics. The scores provided by this system, andtheir resulting rankings, correlate highly with other established

metrics. As this system is based on easily obtainable, publicly avail-able data, it is transparent and reproducible. Our method buildson simple techniques and demonstrates that their application tolarge-scale data can produce surprisingly robust and useful toolsfor scientometric analysis.

ACKNOWLEDGMENTSMartial Hebert suggested the use of correlation metrics as a tech-nique for quantitative evaluation, thereby providing the frameworkfor every table of results in this paper. Emery Berger developedCSRankings [6], which was highly influential in the design andimplementation of this project. Joseph Sill [51], Wayne Winston,Jeff Sagarin and Dan Rosenbaum developed Adjusted Plus-Minus, asports analytics technique that partially inspired this work. Kevin-jeet Gill was unrelenting in advocating for a year-by-year regressionmodel to avoid sampling and quantization artifacts.

8

Page 9: Venue Analytics: A Simple Alternative to Citation ... - arXiv

Figure 5: Automatic clustering for venues in Computer Science, the largest venues in each cluster are labeled.

9

Page 10: Venue Analytics: A Simple Alternative to Citation ... - arXiv

Carnegie Mellon University Massachusetts Institute of Technology Stanford University University of California - Berkeley Univ. of Illinois at Urbana-Champaign University of Michigan

Cornell University University of Washington Georgia Institute of Technology Tsinghua University ETH Zurich University of California - San Diego

University of Maryland - College Park University of Wisconsin - Madison National University of Singapore Columbia University University of Pennsylvania University of Toronto

University of Southern California University of Texas at Austin Northeastern University Princeton University University of California - Los Angeles Technion

Purdue University University of Edinburgh University of Massachusetts Amherst University of Waterloo EPFL Max Planck Institute

KAIST Peking University New York University Harvard University University of British Columbia University of California - Irvine

Figure 6: Heatmap showing the differences in research focusacross different universities. Figure 5 can be used as a guide.

B Póczos

M Balcan

V Aleven

B Myers

C Harrison

B Vasilescu

W Scherlis

J Mostow

P Narasimhan

E Hovy

H Choset

Y SheikhI Gkioulekas

H Miller

F Jahanian

D Garlan

A Kittur

G Fedder

E Xing

S Rudich

B Lucia

M Shamos

T Berg-Kirkpatrick

K Carley

D BrumleyL Cranor

B Haeupler

K Kitani

S Narasimhan

T Kanade

D Woodruff

K Crary

N Sadeh

R Rosenfeld

U Acar

L Bauer

K Sycara

Z Bar-Joseph

J Bagnell

A Pfenning

L Dabbish

C Kulkarni

M O'Toole

L Ahn

R O'Donnell

A Gupta

H Admoni

D Wettergreen

S Balakrishnan

A Waibel

D O'Hallaron

C Atkeson

I Nourbakhsh

J Zimmerman

L Blum

P Gibbons

D Marculescu

B Moseley

K Rashmi

J Carbonell

E Nyberg

D Ramanan

C Goues

S Seshan

R SternS Kim

J McCann

R Murphy

D Andersen

B Raj

J Callan

J Bigham

S Brookes

M Fredrikson

R Rajkumar

Y Yang

G Gordon

T Mowry

Y Agarwal

F Pfenning

L Yao

M Mason

R Bryant

J Hammer

A Steinfeld

T Breaux

C Kingsford

K Crane

J Ma

S Goldstein

V Goyal

D Scott

L Morency

R Kraut

A Dubrawski

J Cassell

A Gupta

D Sicker

S Hudson

V Gligor

D Held

E Clarke

M SatyanarayananD Sleator

S Lucey

N Michael

A Acquisti

G Neubig

G Miller

J Herbsleb

N Beckmann

J Stamper

J Hodgins

G Blelloch

J Hong

A Datta

Y TsvetkovJ Kolter

J Baker

J Hoffmann

A Talwalkar

V Sekar

R Reddy

T Lee

A Pavlo

J Forlizzi

M Shaw

C RiviereH Geyer

M Hebert

M Harchol-Balter

A Ogan

A SinghR Salakhutdinov

J Morris

P Ravikumar

N Shah

W Whittaker

L Kara

G Ganger

N Pollard

J Aldrich

T Sandholm

K Fragkiadaki

R Harper

M Goel

B Parno

M Erdmann

T MitchellW Cohen

C Kästner

M Likhachev

D Siewiorek

A Platzer

K Koedinger

C Liu

C Faloutsos

P Steenkiste

A Procaccia

G Kaufman

R DannenbergK Zhang

V Guruswami

M Blum

A Kelly

R Marculescu

F Fang

C Langmead

J Sherry

A Black

J Hoe

C Rosé

S Kiesler

CSDRIMLDLTIHCIISRBIO

Figure 7: An embedding of Carnegie Mellon University’sSchool of Computer Science with colors indicating sub-departments. For example, the Robotics Institute (RI) hasclear clusters for Robotics, Graphics, CV and HRI.

10

Page 11: Venue Analytics: A Simple Alternative to Citation ... - arXiv

REFERENCES[1] Mokhtar Bin Abdullah. 1990. On a Robust Correlation Coefficient. Journal

of the Royal Statistical Society. Series D (The Statistician) 39, 4 (1990), 455–460.http://www.jstor.org/stable/2349088

[2] Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Craw-ford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman,Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-HanOoi, Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Wang, Chris Will-helm, Zheng Yuan, Madeleine van Zuylen, and Oren Etzioni. 2018. Constructionof the Literature Graph in Semantic Scholar. In NAACL HLT. Association forComputational Linguistics, 84–91. https://doi.org/10.18653/v1/N18-3011

[3] David Arthur and Sergei Vassilvitskii. 2007. K-means++: The Advantages ofCareful Seeding. In SODA. SIAM, Philadelphia, PA, USA, 1027–1035.

[4] Robert M. Bell, Yehuda Koren, and Chris Volinsky. 2009. The BellKor solution tothe Netflix Prize.

[5] Shai Ben-David. 2015. Clustering is Easy When ....What? arXiv e-prints, ArticlearXiv:1510.05336 (Oct 2015), arXiv:1510.05336 pages. arXiv:stat.ML/1510.05336

[6] Emery Berger. 2018. CSRankings: Computer Science Rankings. http://csrankings.org/.

[7] Carl Bergstrom. 2007. Eigenfactor: Measuring the value and prestige of scholarlyjournals. College & Research Libraries News 68, 5 (2007), 314–316.

[8] Steve Blackburn, Carla Brodley, H. V. Jagadish, Kathryn S McKinley, MarioNascimento, Minjeong Shin, Sean Stockwel, Lexing Xie, and Qiongkai Xu. 2018.csmetrics.org: Institutional Publication Metrics for Computer Science. https://github.com/csmetrics/csmetrics.org/blob/master/docs/Overview.md.

[9] DavidMBlei, Andrew YNg, andMichael I Jordan. 2003. Latent dirichlet allocation.Journal of machine Learning research 3, Jan (2003), 993–1022.

[10] Lutz Bornmann and Hans-Dieter Daniel. 2008. What do citation counts measure?A review of studies on citing behavior. J. Doc. 64, 1 (2008), 45–80.

[11] D. Butler. 2011. Computing giants launch free science metrics. Nature 476, 7358(Aug 2011), 18.

[12] Aaron Clauset, Samuel Arbesman, and Daniel B Larremore. 2015. Systematicinequality and hierarchy in faculty hiring networks. Science Advances (2015).

[13] Adam Coates and Andrew Y Ng. 2011. The importance of encoding versustraining with sparse coding and vector quantization. In ICML. 921–928.

[14] Lisa Colledge, Félix de Moya-Anegón, Vicente P Guerrero-Bote, Carmen López-Illescas, Henk F Moed, et al. 2010. SJR and SNIP: two new journal metrics inElsevier’s Scopus. Insights 23, 3 (2010), 215.

[15] Jaime A Teixeira da Silva and Judit Dobránszki. 2018. Multiple versions of theh-index: Cautionary use for formal academic purposes. Scientometrics 115, 2(2018), 1107–1113.

[16] Jaime A Teixeira da Silva and Aamir Raoof Memon. 2017. CiteScore: A cite forsore eyes, or a valuable, transparent metric? Scientometrics 111, 1 (2017), 553–556.

[17] Times Higher Education. 2018. World University Rankings, Computer Sci-ence. https://www.timeshighereducation.com/world-university-rankings/2018/subject-ranking/computer-science.

[18] Matthew E. Falagas, Vasilios D. Kouranos, Ricardo Arencibia-Jorge, and Drosos E.Karageorgopoulos. 2008. Comparison of SCImago journal rank indicator withjournal impact factor. The FASEB Journal 22, 8 (2008), 2623–2628.

[19] Martin Fenner. 2013. What can article-level metrics do for you? PLoS biology 11,10 (2013), e1001687.

[20] National Science Foundation. 2018. Download Awards by Year. https://www.nsf.gov/awardsearch/download.jsp.

[21] Yoav Freund and Robert E Schapire. 1997. A Decision-Theoretic Generalizationof On-Line Learning and an Application to Boosting. J. Comput. System Sci. 55, 1(1997), 119 – 139. https://doi.org/10.1006/jcss.1997.1504

[22] Sebastian Galiani and Ramiro H Gálvez. 2017. The life cycle of scholarly articlesacross fields of research. Technical Report. National Bureau of Economic Research.

[23] Robert Geist, Madhu Chetuparambil, Stephen Hedetniemi, and A. Joe Turner.1996. Computing Research Programs in the U.S. Commun. ACM 39, 12 (Dec.1996), 96–99. https://doi.org/10.1145/240483.240505

[24] J E Hirsch. 2005. An index to quantify an individual’s scientific research output.Proc. Natl. Acad. Sci. U. S. A. 102, 46 (nov 2005), 16569–16572. https://doi.org/10.1073/pnas.0507655102

[25] Jeff Huang. 2018. Best Paper Awards in Computer Science (since 1996). https://jeffhuang.com/best_paper_awards.html.

[26] Peter J. Huber. 1964. Robust Estimation of a Location Parameter. Ann. Math.Statist. 35, 1 (03 1964), 73–101. https://doi.org/10.1214/aoms/1177703732

[27] Peter J. Huber. 1985. Projection Pursuit. The Annals of Statistics 13, 2 (1985),435–475. http://www.jstor.org/stable/2241175

[28] Nevada Policy Research Institute. 2018. Transparent California. https://transparentcalifornia.com/agencies/salaries/#university-system.

[29] Jill Johnes. 2018. University rankings: What do they really show? Scientometrics115, 1 (01 Apr 2018), 585–606. https://doi.org/10.1007/s11192-018-2666-1

[30] Michael J. Kurtz and Edwin A. Henneken. 2017. Measuring metrics - a 40-year longitudinal cross-validation of citations, downloads, and peer review inastrophysics. JAIST 68, 3 (2017), 695–708. https://doi.org/10.1002/asi.23689

[31] Sune Lehmann, Andrew D Jackson, and Benny E Lautrup. 2006. Measures formeasures. Nature 444, 7122 (2006), 1003.

[32] Michael Ley. 2002. The DBLP Computer Science Bibliography: Evolution, Re-search Issues, Perspectives. In String Processing and Information Retrieval, AlbertoH. F. Laender and Arlindo L. Oliveira (Eds.). Springer Berlin Heidelberg, 1–10.

[33] Sarah Theule Lubienski, Emily K. Miller, and Evthokia Stephanie Saclarides. 2018.Sex Differences in Doctoral Student Publication Rates. Educational Researcher 47,1 (2018), 76–81. https://doi.org/10.3102/0013189X17738746

[34] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficientestimation of word representations in vector space. ICLR (2013).

[35] Allison C. Morgan, Dimitrios J. Economou, Samuel F. Way, and Aaron Clauset.2018. Prestige drives epistemic inequality in the diffusion of scientific ideas. EPJData Science 7, 1 (19 Oct 2018), 40.

[36] Allison C. Morgan, Samuel F. Way, and Aaron Clauset. 2018. Automaticallyassembling a full census of an academic field. PLOS ONE 13, 8 (08 2018), 1–18.https://doi.org/10.1371/journal.pone.0202223

[37] US News and World Report. 2018. Best Computer Science Schools.https://www.usnews.com/best-graduate-schools/top-science-schools/computer-science-rankings.

[38] Andrew Nocka, Danning Zheng, Tianran Hu, and Jiebo Luo. 2014. Moneyball forAcademia: Toward Measuring and Maximizing Faculty Performance and Impact.In 2014 IEEE International Conference on Data Mining Workshop. 193–197.

[39] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. ThePageRank citation ranking: Bringing order to the web. Technical Report. Stanford.

[40] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M.Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: MachineLearning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.

[41] Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove:Global vectors for word representation. In EMNLP. 1532–1543.

[42] QS World University Rankings. 2018. Computer Science & Infor-mation Systems. https://www.topuniversities.com/university-rankings/university-subject-rankings/2018/computer-science-information-systems.

[43] Shanghai Rankings. 2015. Academic Ranking of World Universities in ComputerScience. http://www.shanghairanking.com/SubjectCS2015.html.

[44] Jie Ren and Richard N Taylor. 2007. Automatic and versatile publications rankingfor research institutions and scholars. Commun. ACM 50, 6 (2007), 81–85.

[45] Herbert Robbins and Sutton Monro. 1951. A Stochastic Approximation Method.The Annals of Mathematical Statistics 22, 3 (1951), 400–407.

[46] Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretationand validation of cluster analysis. J. Comput. Appl. Math. 20 (1987), 53 – 65.https://doi.org/10.1016/0377-0427(87)90125-7

[47] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learningrepresentations by back-propagating errors. Nature 323, 6088 (1986), 533–536.

[48] Boris Schauerte. 2014. Microsoft Academic: conference field ratings. http://www.conferenceranks.com/visualization/msar2014.html.

[49] Cagan H. Sekercioglu. 2008. Quantifying Coauthor Contributions. Science 322,5900 (2008), 371–371. https://doi.org/10.1126/science.322.5900.371a

[50] Nihar B. Shah, Behzad Tabibian, Krikamol Muandet, Isabelle Guyon, and Ulrikevon Luxburg. 2018. Design and Analysis of the NIPS 2016 Review Process. JMLR19, 49 (2018), 1–34. http://jmlr.org/papers/v19/17-511.html

[51] Joseph Sill. 2010. Improved NBA adjusted+/-using regularization and out-of-sample testing. In MIT Sloan Sports Analytics Conference.

[52] Charles Spearman. 1904. The proof and measurement of association betweentwo things. The American journal of psychology 15, 1 (1904), 72–101.

[53] Richard S. J. Tol. 2008. A rational, successive g-index applied to economicsdepartments in Ireland. J. Informetrics 2 (2008), 149–155.

[54] Marco Valenzuela, Vu Ha, and Oren Etzioni. 2015. Identifying meaningful cita-tions. In AAAI Workshop: Scholarly Big Data.

[55] Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data usingt-SNE. Journal of Machine Learning Research 9 (2008), 2579–2605.

[56] Slobodan Vucetic, Ashis Kumar Chanda, Shanshan Zhang, Tian Bai, and Anirud-dha Maiti. 2018. Peer Assessment of CS Doctoral Programs Shows Strong Corre-lation with Faculty Citations. Commun. ACM 61, 9 (Aug. 2018), 70–76.

[57] W. H. Walters. 2017. Citation-Based Journal Rankings: Key Questions, Metrics,and Data Sources. IEEE Access 5 (2017), 22036–22053. https://doi.org/10.1109/ACCESS.2017.2761400

[58] Su Yan and Dongwon Lee. 2007. Toward Alternative Measures for RankingVenues: A Case of Database Research Community. In Proceedings of the 7thACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’07). ACM, New York,NY, USA, 235–244.

[59] Alexander Yong. 2014. Critique of Hirsch’s citation index: A combinatorial Fermiproblem. Notices of the AMS 61, 9 (2014), 1040–1050.

[60] Fang Zhang and Shengli Wu. 2018. Ranking Scientific Papers and Venues inHeterogeneous Academic Networks by Mutual Reinforcement. In JCDL. ACM,New York, NY, USA, 127–130.

[61] Tong Zhang. 2004. Solving Large Scale Linear Prediction Problems Using Sto-chastic Gradient Descent Algorithms. In ICML. ACM, New York, NY, USA, 116–.

11

Page 12: Venue Analytics: A Simple Alternative to Citation ... - arXiv

0 10 20 30 40 50years since first publication

0.0

0.5

1.0

1.5

2.0

aver

age

annu

al v

alue

gen

erat

ed Aging Curve for Authors

Figure 9: The average productivity of all DBLP authors forthat year of their publishing career.

15 20 25 30 35 40 45 50Number of Clusters

0.525

0.550

0.575

0.600

0.625

0.650

0.675

Silh

ouet

te S

core

Optimal Number of Clusters

Figure 8: Curve showing the natural number of clusters inthe dataset using R1 authors

A CREDIT ASSIGNMENTIn order to address issues of collinearity raised by having authorswho publish papers together, wewanted to solve a credit assignmentproblem. We adapted a well-known method for addressing this byadapting regularized plus minus [51] from the sports analyticsliterature. In our case, we simply regress each publications valuesfrom its authors, as shown below.

©­­­­«

author1 author2 . . . authornpaper1 1 1 . . . 0paper2 1 0 . . . 0...

......

. . ....

paperm 0 1 . . . 1

ª®®®®¬©­­­­«x1x2...

xn

ª®®®®¬=

©­­­­«score1score2...

scoren

ª®®®®¬This technique produced scores that correlated highly with total

value scores. While this may be valuable for understanding anindividual’s contribution, but we were unable to find an evaluationfor this method. The form of this model that considers n-wise orpairwise relationships was also considered, but these matrices weretoo large for us to run trials.

B AGING CURVETo evaluate if our model makes a sensible prediction over thetimescale of a scholar’s career, we built a model to see what anaverage academic career looks like, given that the author is stillpublishing in those years. See Figure 9. Our model suggests a risein productivity for the first 20 years of one’s publishing history,and then a steady decline.

12

Page 13: Venue Analytics: A Simple Alternative to Citation ... - arXiv

1970 1980 1990 2000 2010 2020

2

4

6

8

10

Systems

INFOCOM (7.6)IEEE/ACM TON (7.4)MobiCom (6.0)MobiSys (5.9)SIGMETRICS (5.3)IEEE TMC (5.4)

1970 1980 1990 2000 2010 2020

1

2

3

4

5

6

7

Data Mining

WWW (7.1)KDD (6.1)SDM (5.0)IEEE-TKDE (4.9)CIKM (4.3)ICDM (4.3)

1970 1980 1990 2000 2010 2020

1

2

3

4

5

6

7

8

9Computer Vision

CVPR (8.0)ECCV (6.4)ICCV (0.0)PAMI (5.2)IJCV (3.8)WACV (3.9)

1970 1980 1990 2000 2010 2020

2

4

6

8

10

Parallel

IPDPS (5.8)IEEE-TPDS (4.9)SC (4.3)IPDPS Workshops (4.9)HPDC (4.1)IEEE Trans. Com (3.7)

1970 1980 1990 2000 2010 2020

2

4

6

8

EE Theory

Allerton (9.2)ISIT (6.4)IEEE TIT (6.0)ICASSP (3.8)ACSSC (5.2)ITW (3.8)

1970 1980 1990 2000 2010 2020

1

2

3

4

5

6

7CAD

DAC (5.2)ICCAD (5.0)Proceedings of (5.1)ICCD (4.6)ISLPED (4.0)IEEE TCAD (3.9)

1970 1980 1990 2000 2010 2020

1

2

3

4

5

6

Robotics

RSS (6.4)ICRA (5.2)IJRR (5.3)HRI (4.2)WAFR (0.0)IROS (3.8)

1970 1980 1990 2000 2010 2020

1

2

3

4

5

6

7

HCI

CHI (7.3)CSCW (0.0)UIST (5.0)UbiComp (4.8)CHI Extended Ab (3.5)DIS (3.1)

1970 1980 1990 2000 2010 2020

2

4

6

8

10

12Theory

STOC (11.1)FOCS (10.1)CCC (9.7)SODA (10.2)SIAM JC (9.9)J. ACM (8.0)

1970 1980 1990 2000 2010 2020

2

4

6

8

Parallelism

SPAA (6.0)SIGACT News (4.5)Algorithmica (3.6)LATIN (2.5)Theor. Comput. (2.2)Inf. Comput. (2.4)

1970 1980 1990 2000 2010 2020

1

2

3

4

5

Cyber Physical

HSCC (5.3)CDC (4.6)IEEE-TAC (3.7)ACC (3.7)SIAM Journal on (2.8)IEEE-CNS (4.9)

1970 1980 1990 2000 2010 2020

2

4

6

8

10

12AI

AAAI (7.8)EC (6.1)IJCAI (5.6)AAMAS (4.9)SIGecom Exchang (4.1)ACM TEAC (6.0)

1970 1980 1990 2000 2010 2020

1

2

3

4

5

6

7NLP

EMNLP (6.8)ACL (6.0)NAACL (5.6)TACL (5.4)COLING (3.4)Computational L (2.7)

1970 1980 1990 2000 2010 2020

2

4

6

8

10Parallelism

PODC (5.4)DISC (4.1)Distributed Com (2.7)OPODIS (1.6)ICDCN (1.6)SSS (0.8)

1970 1980 1990 2000 2010 2020

1

2

3

4

5

6

7

8

Graphics

SIGGRAPH (7.4)SIGGRAPH Asia (8.1)TOG (5.5)TVCG (4.7)CGF (4.9)IEEE Vis (4.8)

1970 1980 1990 2000 2010 2020

2

4

6

8

Computer Science

Commun. ACM (6.7)IEEE Computer (3.2)Advances in Com (1.3)SCIENCE CHINA I (1.3)CIG (1.1)CEC (0.7)

1970 1980 1990 2000 2010 2020

1

2

3

4

5

6

7

8

Software Engineering

SIGSOFT FSE (6.3)ICSE (5.9)ISSTA (5.2)ASE (4.6)ECOOP (3.3)IEEE Trans. Sof (2.9)

1970 1980 1990 2000 2010 2020

1

2

3

4

5

6

7

Computational Biology

RECOMB (5.4)JCB (3.0)WABI (3.8)Bioinformatics (2.9)IEEE TCBB (2.7)BCB (3.0)

1970 1980 1990 2000 2010 2020

2

4

6

8

10

Machine Learning

NIPS (10.9)ICML (9.7)COLT (7.9)AISTATS (7.7)JMLR (6.0)UAI (5.5)

1970 1980 1990 2000 2010 2020

2

4

6

8

10Networking

HotNets (9.7)CCS (8.7)NSDI (8.9)SIGCOMM (7.7)USENIX Security (7.8)NDSS (7.9)

1970 1980 1990 2000 2010 2020

2

4

6

8

10Programming Languages

PLDI (7.9)OOPSLA (7.7)POPL (7.5)CAV (5.3)VMCAI (3.6)TPLS (3.4)

1970 1980 1990 2000 2010 2020

2

4

6

8

10

Databases

VLDB (7.6)SIGMOD (7.6)ICDE (5.8)PODS (5.6)CIDR (5.5)TDS (4.9)

1970 1980 1990 2000 2010 2020

2

4

6

8

Architecture

ASPLOS (8.7)HotOS (6.6)MICRO (6.1)ISCA (6.2)HPCA (5.8)PPOPP (5.5)

1970 1980 1990 2000 2010 2020

2

4

6

8

Computational Geometry

SCG (4.1)DCG (2.5)CCCG (2.2)Comput. Geom. (2.0)ISAAC (2.3)WADS (0.0)

1970 1980 1990 2000 2010 2020

0.5

1.0

1.5

2.0

IVA (1.9)ACII (0.0)SocInfo (1.9)JASIST (1.0)ASIST (0.0)

1970 1980 1990 2000 2010 2020

2

4

6

8

Crypto

TCC (9.1)CRYPTO (6.8)EUROCRYPT (7.3)J. Cryptology (4.8)ASIACRYPT (3.6)PKC (4.0)

1970 1980 1990 2000 2010 2020

0.2

0.4

0.6

0.8

1.0

1.2

Softw. Test., V (1.2)SEKE (0.7)International J (0.8)CSEE&T (0.0)

1970 1980 1990 2000 2010 2020

0.5

1.0

1.5

2.0

2.5

3.0

IEEE Trans. Inf (2.9)WIFS (1.0)ICB (0.8)BTAS (0.9)

Figure 10: An automatic clustering and rating of the entirity of CS.

13

Page 14: Venue Analytics: A Simple Alternative to Citation ... - arXiv

Table 9: Top 55 Venues

Name Score Size

STOC 11.71 128FOCS 11.04 120NIPS 10.94 484Conference on Computational Complexity 10.41 53SODA 10.17 224SIAM J. Comput. 9.98 138HotNets 9.53 46ICML 9.46 303TCC 9.06 76NSDI 8.76 69CCS 8.76 114INFOCOM 8.59 434ITCS 8.56 103Allerton 8.53 331ASPLOS 8.43 48SIGCOMM 8.31 55CVPR 8.22 606PLDI 8.05 68J. ACM 8.03 82Theory of Computing 7.94 29USENIX Security Symposium 7.91 62NDSS 7.86 63APPROX-RANDOM 7.86 74COLT 7.78 82VLDB 7.75 185IEEE/ACM Trans. Netw. 7.74 226AAAI 7.73 403SIGMOD Conference 7.62 98SIGGRAPH 7.59 125AISTATS 7.52 138OOPSLA 7.41 75SIGGRAPH Asia 7.38 140CRYPTO 7.24 80EUROCRYPT 7.24 78POPL 7.22 68CHI 7.15 431IEEE SSP 7.11 54WWW 6.82 232HotOS 6.75 27EMNLP 6.61 263EC 6.60 63Commun. ACM 6.59 222GetMobile 6.50 41IMC 6.49 70OSDI 6.46 29ICDE 6.44 183SIGSOFT FSE 6.42 93ECCV 6.42 252ICALP 6.40 148ISIT 6.28 1032KDD 6.23 240ICCV 6.22 292Robotics: Science and Systems 6.20 88MICRO 6.19 62ISCA 6.18 7314

Page 15: Venue Analytics: A Simple Alternative to Citation ... - arXiv

Table 10: Top 55 Universities

school authors papers venue score size normed

Carnegie Mellon University 174 17275 73126 14159University of California - Berkeley 108 11011 51062 10884Massachusetts Institute of Technology 108 9613 47037 10026Univ. of Illinois at Urbana-Champaign 103 11248 44714 9627Technion 96 8795 39601 8656Stanford University 69 7028 36169 8513Georgia Institute of Technology 108 9337 38083 8118Tsinghua University 150 14643 40662 8104University of California - Los Angeles 46 6589 30368 7888University of Michigan 83 7897 33666 7598Tel Aviv University 48 5468 28816 7404University of California - San Diego 74 6646 30836 7142University of Maryland - College Park 76 7751 30795 7089ETH Zurich 39 6673 26120 7081Cornell University 81 5918 30278 6871University of Washington 69 5727 29087 6846University of Southern California 58 7206 26044 6387Columbia University 53 5064 25251 6330EPFL 55 6927 25320 6290Princeton University 47 4991 24203 6252HKUST 57 6640 25134 6190National University of Singapore 75 7210 25123 5801University of Pennsylvania 53 5340 22696 5690University of California - Irvine 71 6624 24070 5628Pennsylvania State University 49 5668 21325 5451Peking University 147 11858 27050 5413University of Toronto 99 5955 24759 5376University of Texas at Austin 52 4485 21225 5346University of California - Santa Barbara 38 4698 19137 5224University of Waterloo 105 7316 23781 5099Max Planck Institute 32 4307 17808 5093Purdue University 71 5659 21734 5082New York University 67 4915 20752 4918Rutgers University 63 4935 20311 4884Nanyang Technological University 71 9143 20827 4870University of Wisconsin - Madison 54 3988 18986 4738University of Massachusetts Amherst 57 4129 19155 4717Hebrew University of Jerusalem 42 3316 17478 4647University of California - Riverside 43 4230 17114 4522Shanghai Jiao Tong University 77 8220 19652 4511Zhejiang University 89 7531 20232 4496Ohio State University 42 4121 16740 4451Duke University 23 2879 13936 4385Chinese Academy of Sciences 46 5795 16497 4285University of Minnesota 43 4346 15855 4190Northwestern University 49 3993 16183 4137Stony Brook University 47 4565 15819 4086University of Edinburgh 115 6461 19292 4058University of Tokyo 80 7524 17762 4042University of Oxford 69 5549 17160 4039Chinese University of Hong Kong 32 3927 13843 3959TU Munich 53 6432 15521 3891KAIST 81 5591 17114 3884Harvard University 34 2471 13640 3837Imperial College London 65 6622 15834 377915

Page 16: Venue Analytics: A Simple Alternative to Citation ... - arXiv

Table 11: Top 50 Authors

Name Score

Philip S. Yu 4354H. Vincent Poor 3873Kang G. Shin 3540Jiawei Han 0001 3143Micha Sharir 3086Thomas S. Huang 2992Don Towsley 2913Xuemin Shen 2755Luc J. Van Gool 2700Noga Alon 2596Christos H. Papadimitriou 2531Leonidas J. Guibas 2525Mahmut T. Kandemir 2467Jie Wu 0001 2452Wen Gao 0001 2416Shuicheng Yan 2373Rama Chellappa 2286Alberto L. Sangiovanni-Vincentelli 2266Georgios B. Giannakis 2266Yunhao Liu 2257Mohamed-Slim Alouini 2238Michael I. Jordan 2213Christos Faloutsos 2173Massoud Pedram 2153Dacheng Tao 2149Hans-Peter Seidel 2145Luca Benini 2139Moshe Y. Vardi 2092Lajos Hanzo 2081Yishay Mansour 2073Sudhakar M. Reddy 2070Sajal K. Das 2045Pankaj K. Agarwal 2042Avi Wigderson 2024Robert E. Tarjan 2019Irith Pomeranz 2003Elisa Bertino 2003Zhu Han 1992Shlomo Shamai 1976David Peleg 1974Victor C. M. Leung 1966Kaushik Roy 0001 1949Xiaoou Tang 1938Hector Garcia-Molina 1923Larry S. Davis 1918Rafail Ostrovsky 1895Eric P. Xing 1881Ness B. Shroff 1879Krishnendu Chakrabarty 1868Hai Jin 0001 1867

16