Extracting formulaic and free text clinical research articles metadata using conditional random fields (original) (raw)
Related papers
Journal of biomedical informatics, 2016
Information extraction from narrative clinical notes is useful for patient care, as well as for secondary use of medical data, for research or clinical purposes. Many studies focused on information extraction from English clinical texts, but less dealt with clinical notes in languages other than English. This study tested the feasibility of using "off the shelf" information extraction algorithms to identify medical concepts from Italian clinical notes. Among all the available and well-established information extraction algorithms, we used MetaMap to map medical concepts to the Unified Medical Language System (UMLS). The study addressed two questions: (Q1) to understand if it would be possible to properly map medical terms found in clinical notes and related to the semantic group of "Disorders" to the Italian UMLS resources; (Q2) to investigate if it would be feasible to use MetaMap as it is to extract these medical concepts from Italian clinical notes. We perform...
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 2014
Observational research using data from electronic health records (EHR) is a rapidly growing area, which promises both increased sample size and data richness - therefore unprecedented study power. However, in many medical domains, large amounts of potentially valuable data are contained within the free text clinical narrative. Manually reviewing free text to obtain desired information is an inefficient use of researcher time and skill. Previous work has demonstrated the feasibility of applying Natural Language Processing (NLP) to extract information. However, in real world research environments, the demand for NLP skills outweighs supply, creating a bottleneck in the secondary exploitation of the EHR. To address this, we present TextHunter, a tool for the creation of training data, construction of concept extraction machine learning models and their application to documents. Using confidence thresholds to ensure high precision (>90%), we achieved recall measurements as high as 99...
Context Thesaurus for the Extraction of Metadata from Medical Research Papers
Much of the academic literature available on the Web has never been adequately catalogued. Consequently, even using large-scale search engines, much of it remains inaccessible to researchers as indexing on this scale lacks the necessary detail to cope with discipline dependent terminologies and ontologies. Metadata has become a popular means to provide such information within known domains. In this paper, we describe an approach to the automatic extraction of metadata from medical research papers. Medical research papers tend to have stereotypic prescribed sections, such as introduction, methods, and conclusions. The approach described uses context thesauri and the semantic structure of the documents to extract metadata based on these stereotypic sections.
A Scoping Review of Adopted Information Extraction Methods for RCTs
Medical Journal of The Islamic Republic of Iran, 2023
Background: Randomized controlled trials (RCTs) provide the strongest evidence for therapeutic interventions and their effects on groups of subjects. However, the large amount of unstructured information in these trials makes it challenging and time-consuming to make decisions and identify important concepts and valid evidence. This study aims to explore methods for automating or semi-automating information extraction from reports of RCT studies. Methods: We conducted a systematic search of PubMed, ACM Digital Library, and Web of Science to identify relevant articles published between January 1, 2010, and 2022. We focused on published Natural Language Processing (NLP), machine learning, and deep learning methods that automate or semi-automate key elements of information extraction in the context of RCTs. Results: A total of 26 publications were included, which discussed the automatic extraction of key characteristics of RCTs using various PICO frameworks (PIBOSO and PECODR). Among these publications, 14 (53.8%) extracted key characteristics based on PICO, PIBOSO, and PECODR, while 12 (46.1%) discussed information extraction methods in RCT studies. Common approaches mentioned included word/phrase matching, machine learning algorithms such as binary classification using the Naïve Bayes algorithm and powerful BERT network for feature extraction, support vector machine for data classification, conditional random field, non-machine-dependent automation, and machine learning or deep learning approaches. Conclusion: The lack of publicly available software and limited access to existing software makes it difficult to determine the most powerful information extraction system. However, deep learning models like Transformers and BERT language models have shown better performance in natural language processing.
Entity Extraction for Clinical Notes, a Comparison Between MetaMap and Amazon Comprehend Medical
Studies in Health Technology and Informatics, 2021
Extracting meaningful information from clinical notes is challenging due to their semi- or unstructured format. Clinical notes such as discharge summaries contain information about diseases, their risk factors, and treatment approaches associated to them. As such, it is critical for healthcare quality as well as for clinical research to extract those information and make them accessible to other computerized applications that rely on coded data. In this context, the goal of this paper is to compare the automatic medical entity extraction capacity of two available entity extraction tools: MetaMap (MM) and Amazon Comprehend Medical (ACM). Recall, precision and F-score have been used to evaluate the performance of the tools. The results show that ACM achieves higher average recall, average precision, and average F-score in comparison with MM.