Empirical Patterns in Google Scholar Citation Counts (CyberPatterns 2014)
-
Upload
peter-breuer -
Category
Data & Analytics
-
view
65 -
download
1
description
Transcript of Empirical Patterns in Google Scholar Citation Counts (CyberPatterns 2014)
Empirical Patternsin
Google Scholar Citation Counts
Peter T. BreuerJonathan P. Bowen
University of BirminghamBirmingham City University
http://bham.academia.edu/PeterBreuer/Talks
Peter Breuer's
Google Scholar Page
AlanTuring's
Google ScholarPage
Although these pages are very different ...
● ...they both share an underlying pattern
(and so does everybody else's we've examined)
To see the pattern ...
● … graph the citation numbers that appear going down the page against their rank
– A 108
– B 73
– C 65
– D 55
– E 44
– ….....
Alan Turing's ..
● … has the same shape, but sharper– A 8093
– B 8093– C 6902
– D 6684
– E 803
– …
– Z 1
Graphs look scale invariant.
– if we see part of a cites graph in isolation
– and don't know what the numbers are
• Can we tell which part of the graph the part we are looking at is from?
The answer is “no”
● Citations graphs are scale-invariant– Count the number of citation counts that
begin with the digit “1”, “2”, “3”, …, “9”
It's “Benford's Law”
● For scale-invariant distributions– C(1) > C(2) > .. > C(9)
What kind of scale-invariance?
● We believe cite counts against ranking are
xn v x
0 e -P Xn
log(xn/x
0) v - P Xn
log(-log(xn/x
0)) v log(P) + 0.5 log(n)
Log-log cites graph should be a straight line!
● It is!(my data)
But Alan Turing's log-log data..
● … follows a different straight line
slope 0.4not 0.5
The generic cites curve
log(log(x0/x
n)) v log(P) + 0.5\A
log(n)
But why?
We need a statistical model
Statistical Model
● log(X/X0) is normally distributed
– σ = µ = λ ∼ 0.2● Trials X ranked in order look like cite counts!● Log-log slope A depends on number of trials N
Some Related Work!
● Others report citation counts distribution is– Log normal
– But investigate across whole fields● Normalization WRT average across field
– Radicci et al, 2008● Universality of Citation Distributions
● Single researcher's work defines own “field”– We normalize WRT max cite, not average
– But same underlying statistical model fits well
Understand
● Individual papers have worth distributed as
– Poisson/normal σ = µ ∼ 0.2 cites/paper
● Cite count depends exponentally on worth– Being cited earns more citations
Predictions
● xn v x
0 e -P Xn
– good estimate for P: X(2X0/S)
S = total number of citations
– at ~150 cited papers A=Xn → A=n0.4
● log(10/X0)/log(20/X
0) ~ (i
10/ i
20)A
– Ik is number of papers cited > k times
● I have i10
/ i20
= 1.98, predicted 2.08
● Alan Turing has 1.44, predicted 1.32
More Predictions
● h-index h ~ ih
– Number h of papers cited at least h times
● Predict (I10
/h)A = log(X0/10)/log(X
0/h)
– For me, solves for h=16.67, have h=16
Conclusion
● Three parameters X0, A, P
– Determine graph of cites against rank
– For most individuals, A=0.5● Drops with greater number N of publications
● Predictions based on the statistical curve …
– Relate i10
, h, etc measures
● Independently of the parameters (except A)
● Everybody's #cite/rank graph is “the same”