Citation needed? Wikipedia bibliometrics during the first wave of the COVID-19 pandemic - PubMed (original) (raw)

Citation needed? Wikipedia bibliometrics during the first wave of the COVID-19 pandemic

Omer Benjakob et al. Gigascience. 2022.

Abstract

Background: With the COVID-19 pandemic's outbreak, millions flocked to Wikipedia for updated information. Amid growing concerns regarding an "infodemic," ensuring the quality of information is a crucial vector of public health. Investigating whether and how Wikipedia remained up to date and in line with science is key to formulating strategies to counter misinformation. Using citation analyses, we asked which sources informed Wikipedia's COVID-19-related articles before and during the pandemic's first wave (January-May 2020).

Results: We found that coronavirus-related articles referenced trusted media outlets and high-quality academic sources. Regarding academic sources, Wikipedia was found to be highly selective in terms of what science was cited. Moreover, despite a surge in COVID-19 preprints, Wikipedia had a clear preference for open-access studies published in respected journals and made little use of preprints. Building a timeline of English-language COVID-19 articles from 2001-2020 revealed a nuanced trade-off between quality and timeliness. It further showed how pre-existing articles on key topics related to the virus created a framework for integrating new knowledge. Supported by a rigid sourcing policy, this "scientific infrastructure" facilitated contextualization and regulated the influx of new information. Last, we constructed a network of DOI-Wikipedia articles, which showed the landscape of pandemic-related knowledge on Wikipedia and how academic citations create a web of shared knowledge supporting topics like COVID-19 drug development.

Conclusions: Understanding how scientific research interacts with the digital knowledge-sphere during the pandemic provides insight into how Wikipedia can facilitate access to science. It also reveals how, aided by what we term its "citizen encyclopedists," it successfully fended off COVID-19 disinformation and how this unique model may be deployed in other contexts.

Keywords: COVID-19; Wikipedia; bibliometrics; citizen science; infodemic; open science; sources.

© The Author(s) 2022. Published by Oxford University Press GigaScience.

PubMed Disclaimer

Conflict of interest statement

Omer Benjakob is a journalist for Haaretz and has written about Wikipedia in the past.

Figures

Figure 1:

Figure 1:

Characterization of scientific sources of the Wikipedia COVID-19 corpus. (A) Bar plot of the most cited academic sources. Top journals are highlighted in green and preprints are represented in red. Bottom right: Box plot of Altmetrics score of the 3 sets: the Wikipedia COVID-19 corpus, the EuroPMC COVID-19 search, and the full Wikipedia dump as of May 2020. Comparison of the occurrence of (B) open-access sources and (C) preprints (medRxiv and bioRxiv) in the 3 sets. Boxplots center indicates the median, and the bottom and top edges indicate the 25th and 75th percentiles; the wiskers extend 1.5 times the interquartile range.

Figure 2:

Figure 2:

Top sources used in the Wikipedia COVID-19 corpus: A) source types, B) news agencies, C) websites, and D) publishers form the COVID-19 corpus sources (per Wikipedia’s citation template terminology). Several denominations for the same institution are present in the raw data which is highlighted here with the example of WHO and World Health Organization

Figure 3:

Figure 3:

Historical perspective of the Wikipedia COVID-19 corpus. (A) COVID-19 article creation per year; inset: number of articles created before and after 2020. (B) Scientific citations added per year to the COVID-19 corpus and globally in Wikipedia (inset). Latency distribution of scientific papers (C) in the COVID-19 corpus and (D) the Wikipedia dump. See Supplementary Fig. S3 and in the GigaDB repository [54]. for an interactive version of the timeline.

Figure 4:

Figure 4:

Network of articles–scientific papers (DOI) in the Wikipedia COVID-19 corpus. A network mapping scientific papers (with DOIs) cited in >1 article in the Wikipedia COVID-19 corpus was constructed. This network is composed of 454 edges, 179 DOIs (blue), and 136 Wikipedia articles (yellow). Nodes represent articles and their size is proportional to the number of connections. A zoom in on the cluster of Wikipedia articles dealing with COVID-19 drug development is depicted here for illustrative purposes. For clarity, edges marked in red indicate those connecting the DOIs cited directly in the “COVID-19 drug development” article and edges marked in blue indicate those connecting these DOIs to other articles citing them. See the GigaDB repository [54] for an interactive version of the network (see Supplementary Dataset S2).

Similar articles

Cited by

References

    1. Heilman JM, West AG. Wikipedia and medicine: quantifying readership, editors, and the significance of natural language. J Med Internet Res. 2015;17(3):e62. - PMC - PubMed
    1. Lavsa SM, Corman SL, Culley CM, et al. Reliability of Wikipedia as a medication information source for pharmacy students. Curr Pharm Teach Learn. 2011;3(2):154–8.
    1. Allahwala UK, Nadkarni A, Sebaratnam DF. Wikipedia use amongst medical students–new insights into the digital revolution. Med Teach. 2013;35(4):337–7. - PubMed
    1. Heilman JM, Kemmann E, Bonert M, et al. Wikipedia: a key tool for global public health promotion. J Med Internet Res. 2011;13(1):e14. - PMC - PubMed
    1. Herbert VG, Frings A, Rehatschek H, et al. Wikipedia–challenges and new horizons in enhancing medical education. BMC Med Educ. 2015;15(1):32. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources