Toward systematic review automation: a practical guide to using machine learning tools in research synthesis - PubMed (original) (raw)

Editorial

Toward systematic review automation: a practical guide to using machine learning tools in research synthesis

Iain J Marshall et al. Syst Rev. 2019.

Abstract

Technologies and methods to speed up the production of systematic reviews by reducing the manual labour involved have recently emerged. Automation has been proposed or used to expedite most steps of the systematic review process, including search, screening, and data extraction. However, how these technologies work in practice and when (and when not) to use them is often not clear to practitioners. In this practical guide, we provide an overview of current machine learning methods that have been proposed to expedite evidence synthesis. We also offer guidance on which of these are ready for use, their strengths and weaknesses, and how a systematic review team might go about using them in practice.

Keywords: Evidence synthesis; Machine learning; Natural language processing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1

Fig. 1

Classifying text using machine learning, in this example logistic regression with a ‘bag of words’ representation of the texts. The system is ‘trained’, learning a coefficient (or weight) for each unique word in a manually labelled set of documents (typically in the 1000s). In use, the learned coefficients are used to predict a probability for an unknown document

Fig. 2

Fig. 2

Bag of words modelling for classifying RCTs. Top left: Example of bag of words for three articles. Each column represents a unique word in the corpus (a real example would likely contain columns for 10,000s of words). Top right: Document labels, where 1 = relevant and 0 = irrelevant. Bottom: Coefficients (or weights) are estimated for each word (in this example using logistic regression). In this example, high +ve weights will increase the predicted probability that an unseen article is an RCT where it contains the words ‘random’ or ‘randomized’. The presence of the word ‘systematic’ (with a large negative weight) would reduce the predicted probability that an unseen document is an RCT

Fig. 3

Fig. 3

Schematic of a typical data extraction process. The above illustration concerns the example task of extracting the study sample size. In general, these tasks involve labelling individual words. The word (or ‘token’) at position t is represented by a vector. This representation may encode which word is at this position and likely also communicates additional features, e.g. whether the word is capitalized or if the word is (inferred to be) a noun. Models for these kinds of tasks attempt to assign labels all T words in a document and for some tasks will attempt to maximize the joint likelihood of these labels to capitalize on correlations between adjacent labels

Fig. 4

Fig. 4

Typical workflow for semi-automated abstract screening. The asterisk indicates that with uncertainty sampling, the articles which are predicted with least certainty are presented first. This aims to improve the model accuracy more efficiently

Similar articles

Cited by

References

    1. Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010;7:e1000326. doi: 10.1371/journal.pmed.1000326. - DOI - PMC - PubMed
    1. Allen IE, Olkin I. Estimating time to conduct a meta-analysis from number of citations retrieved. JAMA. 1999;282:634–635. doi: 10.1001/jama.282.7.634. - DOI - PubMed
    1. Borah R, Brown AW, Capers PL, Kaiser KA. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open. 2017;7:e012545. doi: 10.1136/bmjopen-2016-012545. - DOI - PMC - PubMed
    1. Johnston E. How quickly do systematic reviews go out of date? A survival analysis. J Emerg Med. 2008;34:231. doi: 10.1016/j.jemermed.2007.11.022. - DOI
    1. Tsafnat G., Dunn A., Glasziou P., Coiera E. The automation of systematic reviews. BMJ. 2013;346(jan10 1):f139–f139. doi: 10.1136/bmj.f139. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources