Analysis of reference and citation copying in evolving bibliographic networks (original) (raw)

Effect of citation patterns on network structure

We propose a model for an evolving citation network that incorporates the citation pattern followed in a particular discipline. We define the citation pattern in a discipline by three factors. The average number of references per article, the probability of citing an article based on it's age and the number of citations it already has. We also consider the average number of articles published per year in the discipline. We propose that the probability of citing an article based on it's age can be modeled by a lifetime distribution. The lifetime distribution models the citation lifetime of an average article in a particular discipline. We find that the citation lifetime distribution in a particular discipline predicts the topological structure of the citation network in that discipline. We show that the power law exponent depends on the three factors that define the citation pattern. Finally we fit the data from the Physical Review D journal to obtain the citation pattern and calculate the total degree distribution for the citation network.

Growing complex network of citations of scientific papers: Modeling and measurements

Physical Review E

To quantify the mechanism of a complex network growth we focus on the network of citations of scientific papers and use a combination of the theoretical and experimental tools to uncover microscopic details of this network growth. Namely, we develop a stochastic model of citation dynamics based on copying/redirection/triadic closure mechanism. In a complementary and coherent way, the model accounts both for statistics of references of scientific papers and for their citation dynamics. Originating in empirical measurements, the model is cast in such a way that it can be verified quantitatively in every aspect. Such verification is performed by measuring citation dynamics of Physics papers. The measurements revealed nonlinear citation dynamics, the nonlinearity being intricately related to network topology. The nonlinearity has far-reaching consequences including non-stationary citation distributions, diverging citation trajectory of similar papers, runaways or "immortal papers" with infinite citation lifetime etc. Thus, our most important finding is nonlinearity in complex network growth. In a more specific context, our results can be a basis for quantitative probabilistic prediction of citation dynamics of individual papers and of the journal impact factor.

Aging in citation networks

Physica A-statistical Mechanics and Its Applications, 2005

In many growing networks, the age of the nodes plays an important role in deciding the attachment probability of the incoming nodes. For example, in a citation network, very old papers are seldom cited while recent papers are usually cited with high frequency. We study actual citation networks to find out the distribution T (t) of t, the time interval between the published and the cited paper. For different sets of data we find a universal behaviour: T (t) ∼ t −0.9 for t ≤ t c and T (t) ∼ t −2 for t > t c where t c ∼ O(10).

Modelling aging characteristics in citation networks

Physica A-statistical Mechanics and Its Applications, 2006

Growing network models with preferential attachment dependent on both age and degree are proposed to simulate certain features of citation network noted in . In this directed network, a new node gets attached to an older node with the probability ∼ K(k)f (t) where the degree and age of the older node are k and t respectively. Several functional forms of K(k) and f (t) have been considered. The desirable features of the citation network can be reproduced with K(k) ∼ k −β and f (t) ∼ exp(αt) with β = 2.0 and α = −0.2 and with simple modifications in the growth scheme.

The simultaneous evolution of author and paper networks

Proceedings of the National Academy of Sciences, 2004

There has been a long history of research into the structure and evolution of mankind's scientific endeavor. However, recent progress in applying the tools of science to understand science itself has been unprecedented because only recently has there been access to high-volume and high-quality data sets of scientific output (e.g., publications, patents, grants), as well as computers and algorithms capable of handling this enormous stream of data. This paper reviews major work on models that aim to capture and recreate the structure and dynamics of scientific evolution. We then introduce a general process model that simultaneously grows co-author and paper-citation networks. The statistical and dynamic properties of the networks generated by this model are validated against a 20-year data set of articles published in the Proceedings of the National Academy of Science. Systematic deviations from a power law distribution of citations to papers are well fit by a model that incorporates a partitioning of authors and papers into topics, a bias for authors to cite recent papers, and a tendency for authors to cite papers cited by papers that they have read. In this TARL model (for Topics, Aging, and Recursive Linking), the number of topics is linearly related to the clustering coefficient of the simulated paper citation network.

Modeling scientific-citation patterns and other triangle-rich acyclic networks

2009

The boom of networks studies of the last decade [1, 2] has potentially an impact of the structure of science itself. Network measures can help creating better bibliometric quantities to evaluate scientific impact [3] and the sociological aspect of scientific collaboration and exchange of ideas. Indeed, the study of scientific citations has become a subfield of complex network studies [4–13]. One typical feature of academic citation networks is that the number of citations to a paper decreases with its age.

Large-scale structure of time evolving citation networks

The European Physical Journal B, 2007

In this paper we examine a number of methods for probing and understanding the large-scale structure of networks that evolve over time. We focus in particular on citation networks, networks of references between documents such as papers, patents, or court cases. We describe three different methods of analysis, one based on an expectation-maximization algorithm, one based on modularity optimization, and one based on eigenvector centrality. Using the network of citations between opinions of the United States Supreme Court as an example, we demonstrate how each of these methods can reveal significant structural divisions in the network, and how, ultimately, the combination of all three can help us develop a coherent overall picture of the network's shape.

Statistical modeling of the temporal dynamics in a large scale-citation network

2016

Citation Networks of papers are vast networks that grow over time. The manner or the form a citation network grows is not entirely a random process, but a preferential attachment relationship; highly cited papers are more likely to be cited by newly published papers. The result is a network whose degree distribution follows a power law. This growth of citation network of papers will be modeled with a negative binomial regression coupled with logistic growth and/or Cauchy distribution curve. Then a Barabási Albert model based on the negative binomial models, and a combination of the Dirichlet distribution and multinomial will be utilized to simulate a network that follows preferential attachments between newly added nodes and existing nodes. Acknowledgements I would like to thank everyone at University of Arkansas for being so helpful throughout the three years of my master studies. Thanks to all the faculty and staff for enabling my education goals and providing the opportunities fo...

Universal hierarchical behavior of citation networks

Journal of Statistical Mechanics: Theory and Experiment, 2014

Many of the essential features of the evolution of scientific research are imprinted in the structure of citation networks. Connections in these networks imply information about the transfer of knowledge among papers, or in other words, edges describe the impact of papers on other publications. This inherent meaning of the edges infers that citation networks can exhibit hierarchical features, that is typical of networks based on decision-making. In this paper, we investigate the hierarchical structure of citation networks consisting of papers in the same field. We find that the majority of the networks follow a universal trend towards a highly hierarchical state, and i) the various fields display differences only concerning their phase in life (distance from the "birth" of a field) or ii) the characteristic time according to which they are approaching the stationary state. We also show by a simple argument that the alterations in the behavior are related to and can be understood by the degree of specialization corresponding to the fields. Our results suggest that during the accumulation of knowledge in a given field, some papers are gradually becoming relatively more influential than most of the other papers.

Characterizing and Modeling Citation Dynamics

PLoS ONE, 2011

Citation distributions are crucial for the analysis and modeling of the activity of scientists. We investigated bibliometric data of papers published in journals of the American Physical Society, searching for the type of function which best describes the observed citation distributions. We used the goodness of fit with Kolmogorov-Smirnov statistics for three classes of functions: lognormal, simple power law and shifted power law. The shifted power law turns out to be the most reliable hypothesis for all citation networks we derived, which correspond to different time spans. We find that citation dynamics is characterized by bursts, usually occurring within a few years since publication of a paper, and the burst size spans several orders of magnitude. We also investigated the microscopic mechanisms for the evolution of citation networks, by proposing a linear preferential attachment with time dependent initial attractiveness. The model successfully reproduces the empirical citation distributions and accounts for the presence of citation bursts as well.