Effective Classification of Clinical Reports: Natural Language Processing-Based and Topic Modeling-Based Approaches (original) (raw)

Abstract

More and more, patient information is being stored in digital formats;however, to be able to maximize its usefulness, automated tools needs to be built that can effectively and efficiently process these records. Clinical decision support systems are such tools that can recommend the need for a certain medical test or therapy by examining prior patient information. This can help the clinician avoid unnecessary or potentially harmful tests or therapies. In addition, this type of automated analysis of patient data can help medical professionals make clinical decisions much faster and with more confidence. As such, the speed and quality of healthcare would be improved with reduced costs. One popular automated use of clinical reports is predicting the existence or absence of certain conditions in a given report. This type of analysis, called text classification in general, can learn the characteristics of such conditions from a previously labeled dataset of clinical reports. In this research, novel techniques for better performance of automated classification of clinical reports are developed and compared with conventional approaches. As a first step, classifiers using the raw text of the reports with standard preprocessing techniques are implemented. Additionally, biomedical NLP tools are used to extract the relevant information from the reports in a more consistent way. These extracted features are classified using conventional classifiers, including decision trees and support vector machines (SVM). While results show that the classification performance is significantly improved by using the NLP features over using the raw text, this NLP-based classification is computationally expensive and requires a significant number of manual steps to be used effectively across many different clinical areas. As an alternative, a framework for topic modeling-based classification system is built. Topic modeling techniques automatically find the interpretable themes that exist in a document collection. These topics are used to represent each report and different classifiers are built based on this representation. This system has the advantage of being more adaptable to different clinical domains than custom NLP-based classifiers. It also provides dimension reduction because there are fewer potential topic categories than the number of words in a vocabulary. The performance of topic modeling-based classifiers is better than classification using raw text and it is competitive with classification using NLP features. In addition, they provide a compact and interpretable representation. Results from this dissertation research have significant impacts on the quality and efficiency of healthcare. First of all, the classifiers built in this research can be used to automatically predict the conditions in a clinical report. They can replace the manual review of clinical reports, which can be time consuming and error-prone. In addition, with the increased accuracy and interpretability they provide, clinicians can have more confidence in utilizing such systems in real life settings.

Efsun Sarioglu Kayi hasn't uploaded this paper.

Let Efsun know you want this paper to be uploaded.

Ask for this paper to be uploaded.