Supervised machine learning and active learning in classification of radiology reports - PubMed (original) (raw)

Supervised machine learning and active learning in classification of radiology reports

Dung H M Nguyen et al. J Am Med Inform Assoc. 2014 Sep-Oct.

Abstract

Objective: This paper presents an automated system for classifying the results of imaging examinations (CT, MRI, positron emission tomography) into reportable and non-reportable cancer cases. This system is part of an industrial-strength processing pipeline built to extract content from radiology reports for use in the Victorian Cancer Registry.

Materials and methods: In addition to traditional supervised learning methods such as conditional random fields and support vector machines, active learning (AL) approaches were investigated to optimize training production and further improve classification performance. The project involved two pilot sites in Victoria, Australia (Lake Imaging (Ballarat) and Peter MacCallum Cancer Centre (Melbourne)) and, in collaboration with the NSW Central Registry, one pilot site at Westmead Hospital (Sydney).

Results: The reportability classifier performance achieved 98.25% sensitivity and 96.14% specificity on the cancer registry's held-out test set. Up to 92% of training data needed for supervised machine learning can be saved by AL.

Discussion: AL is a promising method for optimizing the supervised training production used in classification of radiology reports. When an AL strategy is applied during the data selection process, the cost of manual classification can be reduced significantly.

Conclusions: The most important practical application of the reportability classifier is that it can dramatically reduce human effort in identifying relevant reports from the large imaging pool for further investigation of cancer. The classifier is built on a large real-world dataset and can achieve high performance in filtering relevant reports to support cancer registries.

Keywords: Classification; Radiology Information Systems; active learning; machine learning.

Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

PubMed Disclaimer

Figures

Figure 1

System architecture. CRF, conditional random fields; SVM, support vector machine.

Figure 2

Example of a postprocessing rule. LPN, Lexical polarity negative.

Figure 3

Evaluation of active and random sampling on test data. AL, active learning; Balance, Balanced Exploration and Exploitation algorithm; KFF, Kernel Farthest-First algorithm; Self, Self-Confident algorithm.

Figure 4

Evaluation of active and random sampling on test data for identifying reportable cases. AL, active learning; Balance, Balanced Exploration and Exploitation algorithm; KFF, Kernel Farthest-First algorithm; Self, Self-Confident algorithm.

Figure 5

Full learning curves for Simple active learning and random learning with a batch size of 100. AL, active learning.

References

1. Thomas BJ, Ouellette H, Halpern EF, et al. Automated computer-assisted categorization of radiology reports. AJR Am J Roentgenol 2005;184:687–90 -PubMed
1. Dreyer KJ, Kalra MK, Maher MM, et al. Application of recently developed computer algorithm for automatic classification of unstructured radiology reports: validation study. Radiology 2005;234:323–9 -PubMed
1. McCowan IA, Moore DC. Collection of cancer stage data by classifying free-text medical reports. J Am Med Inform Assoc 2007;17:736–45 -PMC -PubMed
1. Cheng L, Zheng J, Savova G, et al. Discerning tumor status from unstructured mri reports: completeness of information in existing reports and utility of automated natural language processing. J Digit Imaging 2010;23:119–32 -PMC -PubMed
1. Settles B. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison, 2009

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Medical
- MedlinePlus Health Information

Supervised machine learning and active learning in classification of radiology reports - PubMed (original) (raw)