The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library...

35
The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy @ indiana . edu Jeegar T. Maru, Computer Science, jmaru @ indiana . edu Robert L. Goldstone, Psychology, rgoldsto @ indiana . edu Process Models vs. Descriptive Models of Scientific Evolution and Structure Demo!

Transcript of The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library...

Page 1: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

The Simultaneous Evolution of Article and Author Networks in

PNAS

Katy Börner, School of Library and Information Science, [email protected]

Jeegar T. Maru, Computer Science, [email protected]

Robert L. Goldstone, Psychology, [email protected]

Process Models vs. Descriptive Models of Scientific Evolution and Structure

Demo!

Page 2: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Overview

1. Descriptive Models vs. Process Models2. Network Properties and Network Models 3. Simple Process Model of PNAS Data4. Model Validation5. Discussion6. Challenges & Opportunities

Citation Distribution of PNAS Article Data

6543210-1

12

10

8

6

4

2

0

-2

-4

Observed

Linear

Page 3: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

1. Descriptive Models vs. Process Models

Descriptive ModelsAim to describe the major features of a (typically static) data set, e.g., statistical patterns of article citation counts, networks of citations, individual differences in citation practice, the composition of knowledge domains, and the identification of research fronts as indicated by new but highly cited papers.

Process Models Aim to simulate, statistically describe, or formally reproduce the statistical and dynamic characteristics of interest. Of particular interest are models that “conform to the measured data not only on the level where the discovery was originally made but also at the level where the more elementary mechanisms are observable and verifiable” (Willinger, Govindan, Jamin, Paxson, & Shenker, 2002), p.2575.

Bibliometrics, Scientometrics, or KDVis

Statistical Physics and Sociology

Page 4: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Process Models

Can be used to predict the effects of Large collaborations vs. single author research on information

diffusion. Different publishing mechanisms, e.g., E-journals vs. books on co-

authorship, speed of publication, etc. Supporting interdisciplinary collaborations (shallow science? or

decrease in duplication?). Many small vs. one large grant on # publications, Ph.D. students, etc. Resource distribution on research output. …

In general, process model provide a means to analyze the structure and dynamics of science -- to study science using the scientific methods of

science as suggested by Derek J. deSolla Price about 40 years ago.

We now do have the data, code and compute power to do this!

Page 5: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Process Models

In Sociology, several mathematical models of network evolution have been developed (Banks & Carley, 95). Most assume a fixed number of edges.Snijders’ Simulation Investigation for Empirical Network Analysis (SIENA) (http://stat.gamma.rug.nl/snijders/siena.html) is a probabilistic model for the evolution of social networks. It assumes a directed graph with a fixed set of actors.

Recent work in Statistical Physics aims to design models and analytical tools to analyze the statistical mechanics of topology and dynamics of real world networks. Of particular interest is the identification of elementary mechanisms that lead to the emergence of small-world (Albert & Barabási, 2002; Watts, 1999) and scale free network structures (Barabási, Albert, & Jeong, 2000). The models assume nodes of one type (e.g., web page, paper, author).

Page 6: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Network Ecologies

Most real world networks exist within a delicate ecology of networks.

To fully understand, e.g., the knowledge diffusion among authors via their papers, both networks need to be considered simultaneously.

Grants

Co-authoring Ph.D. Students PapersAuthors

Page 7: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

2. Network Properties & Network Models

Small World Networks• Number of vertices (n)• Average degree of network nodes <k> • Characteristic path length (l) measures typical distance between

two nodes (global property)

• Clustering coefficient (C) measures cliquishness of a typical neighborhood (local property)

Scale Free Networks• Exponent of power-law distribution () frequency f of the

degree of connectivity k of a vertex is a power function of k, f ~ k-

E.g. very few authors have many collaborators, very few papers attract many citations.

Page 8: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Properties of Diverse Networks

Source:Albert, R., & Barabási, A.-L. (2002). Statistical mechanics of

complex networks.Reviews of Modern Physics, 74(1), 47-97.

Page 9: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Properties of Diverse Networks

Source:Albert, R., & Barabási, A.-L. (2002). Statistical mechanics of

complex networks.Reviews of Modern Physics, 74(1), 47-97.

For undirected co-author networks, the in-degree of a node equals its out-degree and hence the exponents for both distributions are identical. For directed paper citation networks, the number of references is rather small and constant. Only the in-degree distribution (received citations) are considered.

Page 10: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Models for Evolving Networks

Recommended Reading• Albert & Barabási (2002). Statistical mechanics of complex

networks.• Dorogovtsev, S. N., & Mendes, J. F. F. (2002). Evolution of

networks.• Newman, M. E. J. (2001). Scientific collaboration networks. I.

Network construction and fundamental results. • Newman, M. E. J. (2001). Scientific collaboration networks. II.

Shortest paths, weighted networks, and centrality.

Scale Free Networks are typically simulated by processes of incremental

growth, preferential attachment, and rewiring. Preferential attachment supports a

“rich get richer” phenomenon. Paper citations and co-authorships are fixed - no rewiring.

Page 11: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Preferential Attachment

Well connected authorships, articles, and web pages tend to attract still more connections.

Preferential attachment will be modeled as an emergent property of the elementary networking activity of authors reading and citing articles, and also the references listed in articles. Analogously, authors may consider collaborating with co-authors of their co-authors, linking to web pages linked from web pages they read, etc.

Page 12: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

3. Simple Process Model

Simulates the simultaneous grow of co-author and paper-citation networks.

Authors come and go, papers are forever. Very few authors are able to co-author. All existing (but no future) papers can be cited.

InputScript

degree distributionArticle & author statisticsList of all authors & papersCo-citation, co-author, author-paper network references, citations

Model SimulatedNetworks

DataAnalysis

N, <k>, l, C

Page 13: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Sample Input Script File

-------------------------------------------

Model Parameters (0=without, 1=with)

-------------------------------------------

0 Topics

0 Co-Authors

0 Consider References

-------------------------------------------

Model Initialization Values

-------------------------------------------

2 # Years

5 # Authors in Start Year

5 # Papers in Start Year

2 # Papers Consumed (Referenced) per Paper

1 # Papers Produced per Author each Year

5 # Topics

1 # Co-Author(s) per Author

1      # Levels References are Considered

Not shown are parameters that define the age of authors, the number of their active years, and the increase in the number of authors over the years.

Page 14: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Year 0 - Initialization

Year 1

Year 2

Initial setup, first year, and second year topology of a simple author-paper network.

Authors a1, a2 ,… are represented by blue circlesPapers 1, 2, … are denoted by red triangles

Red arrows indicate the information flow (via citation links) from older papers to more recent papers.

Green arrows denote consumed and produced paper-author relationships.

Arrows denote flow of information.

Page 15: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

 (000) (100) - Topics(010) - Co-Authors (001) - Reading References

(000) (100) Topics

(010) Authors (001) References

The Effect of Model Parameters

Co-authoring leads to fewer papers.

Topics lead to disconnected networks.

Page 16: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

The Effect of Reading References

Init + 2 year paper citation networks

without considering references (000) with reading references (001)

Page 17: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

5. Model Validation

The statistical and dynamic properties of the networks generated by this model are validated against a 20-year data set (1982-2001) of documents of type article

published in the Proceedings of the National Academy of Science (PNAS) – about 106,000 unique authors, 472,000 co-author links, 45,120 papers cited within the set, and 114,000 citation references within the set. The PNAS paper network appears to

have one giant component interconnecting 39,588 papers out of the 45,120 papers that are cited by at least one paper in this data set.

Page 18: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Simple Statistics 20 Year Data Set

Used for initialization

Young papers did not garner many citations yet.

Page 19: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

PNAS Simulation Input Script File

-------------------------------------------

Model Parameters (0=without, 1=with)

-------------------------------------------

0 Topics

1 Co-Authors

1 Consider References

-------------------------------------------

Model Initialization Values

-------------------------------------------

21 # Years

4809    # Authors in Start Year

1624 # Papers in Start Year

392  # Additional Authors per Year

30 # Papers Referenced per Paper

1 # Papers Produced per Author each Year

4 # Co-Authors

1 # Levels References are Considered

First year is used for initialization purpose

Page 20: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Simple Statistics

Page 21: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Comparison PNAS & SIM

Total number of papers (#p), authors (#a), received citations (#c) and references (#r) for years 1982 through 2001.

Figure 7: Total number of papers (#p) and authors (#a) for years 1982 through 2001.The growing average number of references and received citations is displayed in Figure 8.

Page 22: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

100 Year Simulation

Time1982 2001

# papers

Papers cited by papers in X

Papers in X

Papers citing papers in X

100 year simulation covers larger network with similar characteristics.

Page 23: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Simple Statistics

Total # of received Citations

differs considerably from

Citations received from papers in 20 year PNAS!

Page 24: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Simple Statistics

Average # references per paper:~30

Average # references to papers in 20 year PNAS data set:~3

Page 25: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

PNAS Simulation Input Script File

-------------------------------------------

Model Parameters (0=without, 1=with)

-------------------------------------------

0 Topics

1 Co-Authors

1 Consider References

-------------------------------------------

Model Initialization Values

-------------------------------------------

21 # Years

4809    # Authors in Start Year

1624 # Papers in Start Year

392  # Additional Authors per Year

30/3 # Papers Referenced per Paper

1 # Papers Produced per Author each Year

4 # Co-Authors

1 # Levels References are Considered

Page 26: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Comparison PNAS within 20 years & SIM 3 refs

Simple statistics match to certain degree.

Page 27: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Power Law Distribution ExponentsPNAS Simulation

Rsq d.f. F Sigf b0 b1

.926 204 2570.00 .000 8.3194 -1.5345

Citation Distribution of PNAS Article Data

6543210-1

12

10

8

6

4

2

0

-2

-4

Observed

Linear

Rsq d.f. F Sigf b0 b1

.877 70 497.88 .000 10.2251 -2.2378

ln(ncited)

Citation Distribution of Simulated Data

6543210-1

10

8

6

4

2

0

-2

Observed

Linear

SIM PNAS 3 refswithout topics

ln(ncited)

ln(frequ)

Page 28: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Power Law Distribution ExponentsPNAS Simulation

Rsq d.f. F Sigf b0 b1

.926 204 2570.00 .000 8.3194 -1.5345

Citation Distribution of PNAS Article Data

6543210-1

12

10

8

6

4

2

0

-2

-4

Observed

Linear

Rsq d.f. F Sigf b0 b1

.877 70 497.88 .000 10.2251 -2.2378

ln(ncited)

Citation Distribution of Simulated Data

6543210-1

10

8

6

4

2

0

-2

Observed

Linear

SIM PNAS 3 refswithout topics

ln(ncited)

ln(frequ)

Systematic deviations from the power law are that the least-cited and most-cited papers are cited less often than predicted by a power-law, and the moderately-cited papers are cited more often than predicted.

Page 29: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

100 Year Simulation Input Script File

----------------------------------------

Model Parameters (0=without, 1=with)

----------------------------------------

0/1 Topics

1 Co-Authors

1 Consider References

----------------------------------------

Model Initialization Values

----------------------------------------

100 # Years

100 # Authors in Initial Year

30 # Papers in Initial Year

3 # Papers Referenced per Paper

1 # Papers Produced per Author per Year

5 # Topics

4 # Co-Author(s) per Author

1 # of Levels References are

Considered

Same author/paper ratio.Not 30 but 3 references.

Page 30: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Power Law Distribution Exponents100 Year Simulation

Citation Distribution of 100 Year Simulation

543210-1

8

6

4

2

0

-2

Observed

Linear

Rsq d.f. F Sigf b0 b1

.898 86 759.42 .000 7.3336 -1.6121

If topics are considered, the distribution shows the same systematic deviations from a power law as observed for PNAS article data set.

with topics

6543210-1

8

6

4

2

0

-2

Observed

Linear

Rsq d.f. F Sigf b0 b1

.820 96 436.51 .000 6.1883 -1.2961

without topics

ln(ncited) ln(ncited)

ln(frequ)

Page 31: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Properties of PNAS & Simulated Networks

Topics increase C

Papers of highly cited authors are cited more.

Page 32: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

5. Discussion

This paper presented first results on modeling the simultaneous evolution

and structure of author-paper networks.

Unique Features: Author and paper networks grow simultaneously. Model uses the reading and citing of paper references as a

grounded mechanism to generate scale free paper citation networks.

Topics appear to be important to model cluster coefficients observed in real world paper citation networks.

Topics also lead to distributions of citation frequencies that show the same systematic deviations from a power law as observed for PNAS article data set.

Page 33: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Implemented in the model is the ‘aging’ or rather ‘deactivation of authors’. If authors are more likely to cite papers of active authors then the deactivation of all authors of a paper would decrease the ‘attraction’ or ‘fitness’ of the paper to receive citation by another paper. The deactivation of authors would also cause previous co-authors to search for new co-authors.

For the sake of simplicity we fixed the number of papers produced by each authors per year and fixed the number of co-authors. To model the rich get richer effect for co-author networks, we plan to have authors co-author with co-authors of their co-authors.

The productivity of an author may depend not only from his/her position in the author-paper network but also require information on available research funds, facilities, and students. Information on grant support could be modeled as a third network. The other data is harder to come by.

Page 34: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

6. Challenges & Opportunities

Clearly, further validation of the model with different parameter settings and other data sets is necessary.

Model should be extended to consider the interactions with a third network of grant data.

Of particular interest to us are studies on the evolution of topic fields, growth by differentiation, and the speed of knowledge diffusion for more or less clustered networks.

Page 35: The Simultaneous Evolution of Article and Author Networks in PNAS Katy Börner, School of Library and Information Science, katy@indiana.edukaty@indiana.edu.

Katy Börner, Jeegar T. Maru, Robert L. Goldstone: The Simultaneous Evolution of Article and Author Networks in PNAS. Presented at the Mapping Knowledge Domains, Arthur M. Sackler Colloquium, Irvine, CA, May 9-11, 2003.

Acknowledgements

This work greatly benefited from discussions with Mark Newman. He also made his code available to determine the small world properties. Kevin W. Boyack provided insightful comments on an earlier version of this paper.

Pajek (Batagelj & Mrvar, 1998) was used to draw the graphs. It is available at http://vlado.fmf.uni-lj.si/pub/networks/pajek/.

The data used in this paper was extracted from Science Citation Index Expanded – the Institute for Scientific Information®, Inc. (ISI®), Philadelphia, Pennsylvania, USA: © Copyright Institute for Scientific Information®, Inc. (ISI®). All rights reserved. No portion of this data set may be reproduced or transmitted in any form or by any means without prior written permission of the publisher.