Is the Relationship Between Journal Impact Factors and Article Citations Growing Weaker? - The Scholarly Kitchen (original) (raw)

LRO Recent Impact (Photo credit: Wikipedia)

Three information scientists at the Université du Québec à Montréal are claiming that digital journal publication has resulted in a weakening relationship between the article and the journal in which it is published. They also claim that highly-cited articles are increasingly being found in non-highly-cited journals, resulting in a slow erosion of the predictive value of the journal impact factor.

Their article, “The weakening relationship between the Impact Factor and papers’ citations in the digital age,” by George Lozano and others, was published in the October issue of the Journal of the American Society of Information Science and Technology (JASIST). Their manuscript can be also be found in the arXiv.

The paper continues a controversy between those who believe that digital publishing is narrowing the attention of scientists and those who believe it is expanding it. There are theories to support each viewpoint.

As it requires access to a huge dataset of citation data in order to answer the hypothesis — data that greatly exceeds most of our abilities to acquire and process — we need to focus on evaluating the specific methods of each paper. Different kinds of analysis can return very different results, especially if the researchers violate the assumptions behind statistical tests. Large observational studies with improper controls may detect differences that are the result of some other underlying cause.

No study is perfect, but the authors need to be clear what they cannot conclude with certainty.

The Lozano paper is based on measuring the relationship between the citation performance of articles against the impact factor of their journal. The authors do this by calculating the coefficient of determination (R2), which is used to measure the goodness of fit between a regression line and the data it attempts to model. Lozano repeats this calculation for every year from 1900 to 2011, plotting the results and then attempting to fit a new regression line through these values.

There are a number of methodological problems with using this approach:

**R2 assumes that the data are independent observations, which they are not. If we plot the relationship between article citations for each article (X) and the impact factor of their journal (Y), then Y is not variable — it is exactly the same value for each article. Moreover, the calculation of the journal impact factor is based on the citation performances of each article, meaning that the two numbers are correlated. The result is that when calculating the R2, larger journals and journals with higher impact factors will have disproportionate influence on their results.
Attempting to fit a grand regression line through the R2 values for each year also assumes that a journal’s impact factor is independent from year to year, which by definition, it is not. The impact factor for year Y is correlated by the impact factor at year Y+1 because half of the articles published in that journal are still being counted in the construction of the next year’s impact factor.
The authors assume that the journal article dataset is consistent over time. Lozano uses data from the Web of Science which has been adding more journals over its lifetime. In 2011, for example, it greatly expanded its coverage of regional journals. Over a decade ago, it added a huge backfile of historical material–the Century of Science. Lozano does not control for the increase of journals in his dataset nor the growth of citations, meaning that their results could be an artifact of the dataset and not an underlying phenomenon.
Last, the authors assume natural breakpoints in their dataset, which appear to be somewhat arbitrary. While the authors postulate a difference starting in 1990, they also create a breakpoint at 1960 but ignore other obvious breakpoints in their data (look at what happens beginning in 1980, for instance). If you eyeball their figures, you can draw many different lines through their data points, one for every hypothesis you wish to support. There have been many developments in digital publishing since the 1990s that don’t seem to enter the discussion. While the authors try to make a causal connection between the arXiv and physics citations, for example, there is no attempt to look for other explanations such as institutional and consortial licensing, journal bundling (aka, “The Big Deal”), not to mention the widespread adoption of email, listservs, or the graphical web browser. There is no discussion of other possible causes or explanations in their discussion.

The paper reads as if the conclusions have been written ahead of the analysis, conclusions which included the following:

Online, open-access journals, such as in the PLoS family of journals, and online databases, such as the ArXiv system and its cognates, will continue to gain prominence. Using these open-access repositories, experts can find publications in their respective fields and decide which ones are worth reading and citing, regardless of the journal. Should the relationship between IFs and papers’ citations continue to weaken, the IF will slowly lose its legitimacy as an indicator of the quality of journals, papers, and researchers.

This may explain the cheers heard around the altmetrics communities when this article was first published.

I’ve had a couple of great discussion with colleagues about this paper. We all agree that Lozano and his group are sitting on a very valuable dataset that is nearly impossible to construct without purchasing the data from Thomson Reuters. My requests to see their data (even a subset thereof) for validation purposes have gone unheeded. New papers with new analyses are forthcoming, I was told.

Discussions with colleagues surfaced two different ways to analyze their dataset. Tim Vines suggested using the coefficient of variation, which is a more direct way to measure the distribution of citations and controls for the performance of each journal. I suggest setting up their analysis as a repeated measures design, where the performance of each journal is observed every year over the course of the study. We all agreed that the authors are posing an interesting question and have the data to answer it. Unfortunately, the authors seemed too eager to make strong conclusions from inappropriate and rudimentary analyses, and the authors’ unwillingness to share their data for validation purposes does not give me confidence in their results.