PAGERANK-RELATED METHODS FOR ANALYZING...
Transcript of PAGERANK-RELATED METHODS FOR ANALYZING...
PAGERANK-RELATED METHODSFOR ANALYZING CITATION NETWORKS
Author: Ludo Waltman and Erjia YanPresenter: Erjia Yan
Boğaziçi University, IstanbulISSI, June 29
• Objectives– understandings of PageRank– applications of PageRank in informetric research– tutorial: extracting journal citation networks
through bibliographic data– tutorial: computing PageRank for journals in
journal citation networks using Sci2 and MATLAB
Objectives | 2
NON-RECURSIVE
• journal impact factor• h-index• accumulative number of
citations• accumulative number of
publications• …
RECURSIVE
• PageRank and its variants– AuthorRank (Liu et al., 2005)– Y-factor (Bollen et al., 2006)– CiteRank (Walker et al., 2007)– FutureRank (Sayyadi &
Getoor, 2009)– Eigenfactor (Bergstrom &
West, 2008)– SCImago (SCImago, 2007)– weighted PageRank (Ding,
2011; Yan & Ding, 2011)– …
A comparison | 3
NON-RECURSIVE RECURSIVE
A comparison | 4
• Observations– non-recursive methods take into account only the local
structure of a citation network; thus, a citation originating from Nature or Science has the same weight as a citation originating from some obscure journals
• Motivations– using recursive methods to take into account the global
structure of a citation network such that citations originating from highly cited nodes are given more weight than those originating from lowly cited nodes
Observations and motivations | 5
• Basics of PageRank– the concept was first proposed by Pinski and Narin in 1976
(influence weight); PageRank was introduced as a method for ranking web pages by Brin and Page in 1998
• Formulation
– where α denotes the damping factor parameter, Bi denotes the set of all web pages that link to web page i, mj denotes the number of web pages to which web page j links, and ndenotes the total number of web pages to be ranked.
Basics of PageRank | 6
nmp
piBj j
ji
1)1(
• In other words…– the larger the number of web pages that link to web page i,
the higher the PageRank value of web page i– the higher the PageRank values of the web pages that link
to web page i, the higher the PageRank value of web page i– for those web pages that link to web page i, the smaller the
number of other web pages to which these web pages link, the higher the PageRank value of web page i
– the closer the damping factor parameter α is set to 1, the stronger the above effects
PageRank meanings | 7
• On the damping factor– 1: PageRank won’t converge– just below 1 (e.g., 0.9999): extremely sensitive to small
changes in the network of links– 0.5: according to Chen et al. (2007), 0.5 is preferred for
citation networks based on the assumption that authors on average will browse as far as two degrees of references (references and references’ cited references, thus 1-1/2=0.5)
– 0.85: the default (coincide the “six degrees of separation”: 1-1/60.85)
Damping factor | 8
• Applications– Analyzing journal citation networks
• Y-factor; Eigenfactor; SCImago Journal Rank (SJR)
– Analyzing author citation networks• SARA (science author rank algorithm)
– Analyzing document citation networks• CiteRank
Applications | 9
TUTORIALS
Tutorials | 10
• Tools we need– Sci2: https://sci2.cns.iu.edu/user/index.php – Sci2 plugins:
http://wiki.cns.iu.edu/display/SCI2TUTORIAL/3.2+Additional+Plugins
– MATLAB or Octave: http://www.gnu.org/software/octave/
• Data materials– http://www.pages.drexel.edu/~ey86/p/tutorial/
Tools and materials | 11
Steps 1-5 | 12
• Step 6: merge individually downloaded files– on Windows systems, a command such as copy *.txt
merged_data.txt can be entered in the Command Prompt tool
– in the resulting file, make sure to remove all lines ‘FN Thomson Reuters Web of Knowledge VR 1.0’ except for the first one and all lines ‘EF’ except for the last one
• Step 7: change file extension– change the extension of the text file that contains your
bibliographic data from .txt into .isi.
Steps 6-7 | 13
Steps 8-9 | 14
Steps 10-12 | 15
Step 13 | 16
Steps 14-19 | 17
Step 19 | 18
function p = calc_PageRank(C, alpha, n_iterations)
% Take care of dangling nodes.
m = sum(C, 2);
C(m == 0, :) = 1;
% Create a row-normalized matrix.
n = length(C);
m = sum(C, 2);
C = spdiags(1 ./ m, 0, n, n) * C;
% Apply the power method.
p = repmat(1 / n, [1 n]);
for i = 1:n_iterations
p = alpha * p * C + (1 - alpha) / n;
end
Steps 20-21 | 19
The resulted PageRank scores for the journals
• Author and document citation networks and PageRank calculations can be obtained through extracting proper networks in Sci2
Other citation network types | 20
• Questions?
• Any further questions can be directed to:– Erjia Yan [email protected] or– Ludo Waltman [email protected]
Thank you | 21