Nonuniversal power law scaling in the probability distribution of scientific citations (original) (raw)

Power-law citation distributions are not scale-free

Physical Review E, 2017

We analyze time evolution of statistical distributions of citations to scientific papers published in one year. While these distributions can be fitted by a power-law dependence we find that they are nonstationary and the exponent of the power law fit decreases with time and does not come to saturation. We attribute the nonstationarity of citation distributions to different longevity of the low-cited and highly-cited papers. By measuring citation trajectories of papers we found that citation careers of the low-cited papers come to saturation after 10-15 years while those of the highly-cited papers continue to increase indefinitely: the papers that exceed some citation threshold become runaways. Thus, we show that although citation distribution can look as a power-law, it is not scale-free and there is a hidden dynamic scale associated with the onset of runaways. We compare our measurements to our recently developed model of citation dynamics based on copying/redirection/triadic closure and find explanations to our empirical observations.

Growing complex network of citations of scientific papers: Modeling and measurements

Physical Review E

To quantify the mechanism of a complex network growth we focus on the network of citations of scientific papers and use a combination of the theoretical and experimental tools to uncover microscopic details of this network growth. Namely, we develop a stochastic model of citation dynamics based on copying/redirection/triadic closure mechanism. In a complementary and coherent way, the model accounts both for statistics of references of scientific papers and for their citation dynamics. Originating in empirical measurements, the model is cast in such a way that it can be verified quantitatively in every aspect. Such verification is performed by measuring citation dynamics of Physics papers. The measurements revealed nonlinear citation dynamics, the nonlinearity being intricately related to network topology. The nonlinearity has far-reaching consequences including non-stationary citation distributions, diverging citation trajectory of similar papers, runaways or "immortal papers" with infinite citation lifetime etc. Thus, our most important finding is nonlinearity in complex network growth. In a more specific context, our results can be a basis for quantitative probabilistic prediction of citation dynamics of individual papers and of the journal impact factor.

and S.Solomon, ”Runaway events dominate the heavy tail of citation distributions

2012

Statistical distributions with heavy tails are ubiquitous in natural and social phenomena. Since the entries in heavy tail have unproportional significance, the knowledge of its exact shape is very important. Citations of scientific papers form one of the best-known heavy tail distributions. Even in this case there is a considerable debate whether citation distribution follows the log-normal or power-law fit. The goal of our study is to solve this debate by measuring citation distribution for a very large and homogeneous data. We measured citation distribution for 418,438 Physics papers published in 1980-1989 and cited by 2008. While the log-normal fit deviates too strong from the data, the discrete power-law function with the exponent γ = 3.15 does better and fits 99.955% of the data. However, the extreme tail of the distribution deviates upward even from the power-law fit and exhibits a dramatic "runaway" behavior. The onset of the runaway regime is revealed macroscopically as the paper garners 1000-1500 citations, however the microscopic measurements of autocorrelation in citation rates are able to predict this behavior in advance.

Characterizing and Modeling Citation Dynamics

PLoS ONE, 2011

Citation distributions are crucial for the analysis and modeling of the activity of scientists. We investigated bibliometric data of papers published in journals of the American Physical Society, searching for the type of function which best describes the observed citation distributions. We used the goodness of fit with Kolmogorov-Smirnov statistics for three classes of functions: lognormal, simple power law and shifted power law. The shifted power law turns out to be the most reliable hypothesis for all citation networks we derived, which correspond to different time spans. We find that citation dynamics is characterized by bursts, usually occurring within a few years since publication of a paper, and the burst size spans several orders of magnitude. We also investigated the microscopic mechanisms for the evolution of citation networks, by proposing a linear preferential attachment with time dependent initial attractiveness. The model successfully reproduces the empirical citation distributions and accounts for the presence of citation bursts as well.

The Transition Towards Immortality: Non-linear Autocatalytic Growth of Citations to Scientific Papers

Journal of Statistical Physics, 2013

We discuss microscopic mechanisms of complex network growth, with the special emphasis of how these mechanisms can be evaluated from the measurements on real networks. As an example we consider the network of citations to scientific papers. Contrary to common belief that its growth is determined by the linear preferential attachment, our microscopic measurements show that it is driven by the nonlinear autocatalytic growth. This invalidates the scale-free hypothesis for the citation network. The nonlinearity is responsible for a dramatic dynamical phase transition: while the citation lifetime of majority of papers is 6-10 years, the highly-cited papers have practically infinite lifetime. Keywords power-law distribution • citations • preferential attachment • complex networks • autocatalytic growth

Statistical modeling of the temporal dynamics in a large scale-citation network

2016

Citation Networks of papers are vast networks that grow over time. The manner or the form a citation network grows is not entirely a random process, but a preferential attachment relationship; highly cited papers are more likely to be cited by newly published papers. The result is a network whose degree distribution follows a power law. This growth of citation network of papers will be modeled with a negative binomial regression coupled with logistic growth and/or Cauchy distribution curve. Then a Barabási Albert model based on the negative binomial models, and a combination of the Dirichlet distribution and multinomial will be utilized to simulate a network that follows preferential attachments between newly added nodes and existing nodes. Acknowledgements I would like to thank everyone at University of Arkansas for being so helpful throughout the three years of my master studies. Thanks to all the faculty and staff for enabling my education goals and providing the opportunities fo...

Runaway events dominate the heavy tail of citation distributions

2012

Abstract Statistical distributions with heavy tails are ubiquitous in natural and social phenomena. Since the entries in heavy tail have unproportional significance, the knowledge of its exact shape is very important. Citations of scientific papers form one of the best-known heavy tail distributions. Even in this case there is a considerable debate whether citation distribution follows the log-normal or power-law fit. The goal of our study is to solve this debate by measuring citation distribution for a very large and homogeneous data.

Effect of citation patterns on network structure

We propose a model for an evolving citation network that incorporates the citation pattern followed in a particular discipline. We define the citation pattern in a discipline by three factors. The average number of references per article, the probability of citing an article based on it's age and the number of citations it already has. We also consider the average number of articles published per year in the discipline. We propose that the probability of citing an article based on it's age can be modeled by a lifetime distribution. The lifetime distribution models the citation lifetime of an average article in a particular discipline. We find that the citation lifetime distribution in a particular discipline predicts the topological structure of the citation network in that discipline. We show that the power law exponent depends on the three factors that define the citation pattern. Finally we fit the data from the Physical Review D journal to obtain the citation pattern and calculate the total degree distribution for the citation network.

Universality of citation distributions: A new understanding

Quantitative Science Studies, 2021

Universality of scaled citation distributions was claimed a decade ago but its theoretical justification has been lacking so far. Here, we study citation distributions for three disciplines—Physics, Economics, and Mathematics—and assess them using our explanatory model of citation dynamics. The model posits that the citation count of a paper is determined by its fitness: the attribute, which, for most papers, is set at the moment of publication. In addition, the papers’ citation count is related to the process by which the knowledge about this paper propagates in the scientific community. Our measurements indicate that the fitness distribution for different disciplines is nearly identical and can be approximated by the log-normal distribution, while the viral propagation process is discipline specific. The model explains which sets of citation distributions can be scaled and which cannot. In particular, we show that the near-universal shape of the citation distributions for differen...

Stochastic Dynamical Model of a Growing Citation Network Based on a Self-Exciting Point Process

Physical Review Letters, 2012

We put under experimental scrutiny the preferential attachment model that is commonly accepted as a generating mechanism of the scale-free complex networks. To this end we chose a citation network of physics papers and traced the citation history of 40 195 papers published in one year. Contrary to common belief, we find that the citation dynamics of the individual papers follows the superlinear preferential attachment, with the exponent ¼ 1:25-1:3. Moreover, we show that the citation process cannot be described as a memoryless Markov chain since there is a substantial correlation between the present and recent citation rates of a paper. Based on our findings we construct a stochastic growth model of the citation network, perform numerical simulations based on this model and achieve an excellent agreement with the measured citation distributions.