TextHunter - A User Friendly Tool for Extracting Generic Concepts from Free Text in Clinical Research (original) (raw)

Unlocking the Power of Clinical Notes: Natural Language Processing in Healthcare

Acta Scientific Medical Sciences, 2024

Electronic Health Records (EHRs) have become the backbone of modern healthcare, providing a comprehensive record of a patient's medical journey. However, a significant portion of this data resides in clinical notes, predominantly consist of unstructured text. While valuable for consumption by medical professionals, this format presents challenges for traditional data analysis methods. Natural Language Processing (NLP) offers a powerful solution to structure the information presented and unlock the potential of clinical notes. This paper explores the application of NLP tasks within the healthcare domain, specifically focusing on EHR data. We delve into the NLP pipeline, which allows us to differentiate between essential upstream tasks like tokenization and downstream tasks like named entity recognition (NER) and relation extraction. We showcase how NLP can extract crucial clinical information through these tasks and also emphasize the importance of de-identification for maintaining patient privacy. A major challenge in NLP for healthcare is the limited availability of labeled clinical data. We discuss this bottleneck and explore potential solutions like active learning and transfer learning. Finally, the paper highlights the transformative potential of NLP in healthcare data processing and paves the way for future advancements in this dynamic field.

ADVANCEMENTS IN BIOMEDICAL NATURAL LANGUAGE PROCESSING: EXTRACTING INSIGHTS FROM HEALTHCARE TEXT DATA

Biomedical Natural Language Processing (BioNLP) is revolutionizing healthcare by enabling the extraction of valuable insights from vast amounts of textual data. This research paper explores recent developments in BioNLP methodologies and applications. Topics include entity recognition in clinical notes, relation extraction from biomedical literature, and the utilization of deep learning for medical text understanding. Additionally, ethical considerations in handling sensitive health information are discussed. The study contributes to the growing field of Biomedical NLP, showcasing its potential impact on clinical decision-making, research, and healthcare management.

Challenges in Understanding Clinical Notes: Why NLP Engines Fall Short and Where Background Knowledge Can Help

2013

Understanding of Electronic Medical Records(EMRs) plays a crucial role in improving healthcare outcomes. However, the unstructured nature of EMRs poses several technical challenges for structured information extraction from clinical notes leading to automatic analysis. Natural Language Processing(NLP) techniques developed to process EMRs are effective for variety of tasks, they often fail to preserve the semantics of original information expressed in EMRs, particularly in complex scenarios. This paper illustrates the complexity of the problems involved and deals with conflicts created due to the shortcomings of NLP techniques and demonstrates where domain specific knowledge bases can come to rescue in resolving conflicts that can significantly improve the semantic annotation and structured information extraction. We discuss various insights gained from our study on real world dataset.

Information Extraction from Electronic Medical Records using Natural Language Processing Techniques

Journal of Applied Sciences and Environmental Management

Patients share key information about their health with medical practitioners during clinic consultations. These key information may include their past medications and allergies, current situations/issues, and expectations. The healthcare professionals store this information in an Electronic Medical Record (EMR). EMRs have empowered research in healthcare; information hidden in them if harnessed properly through Natural Language Processing (NLP) can be used for disease registries, drug safety, epidemic surveillance, disease prediction, and treatment. This work illustrates the application of NLP techniques to design and implement a Key Information Retrieval System (KIRS framework) using the Latent Dirichlet Allocation algorithm. The cross-industry standard process for data mining methodology was applied in an experiment with an EMR dataset from PubMed todemonstrate the framework. The new system extracted the common problems (ailments) and prescriptions across the five (5) countries pr...

Unsupervised method for extracting machine understandable medical knowledge from a large free text collection

AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 2009

Definitions of medical concepts (e.g diseases, drugs) are essential background knowledge for researchers, clinicians and health care consumers. However, the rapid growth of biomedical research requires that such knowledge continually needs updating. To address this problem, we have developed an unsupervised pattern learning approach that extracts disease and drug definitions from automatically structured randomized clinical trial (RCT) abstracts. In addition, each extracted definition is semantically classified without relying on external medical knowledge. When used to identify definitions from 100 manually annotated RCT abstracts, our medical definition knowledge base has precision of 0.97, recall of 0.93, F1 of 0.94 and semantic classification accuracy of 0.96.

Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010

Journal of the American Medical Informatics Association, 2011

Objective As clinical text mining continues to mature, its potential as an enabling technology for innovations in patient care and clinical research is becoming a reality. A critical part of that process is rigid benchmark testing of natural language processing methods on realistic clinical narrative. In this paper, the authors describe the design and performance of three state-of-the-art text-mining applications from the National Research Council of Canada on evaluations within the 2010 i2b2 challenge. Design The three systems perform three key steps in clinical information extraction: (1) extraction of medical problems, tests, and treatments, from discharge summaries and progress notes; (2) classification of assertions made on the medical problems; (3) classification of relations between medical concepts. Machine learning systems performed these tasks using large-dimensional bags of features, as derived from both the text itself and from external sources: UMLS, cTAKES, and Medline. Measurements Performance was measured per subtask, using micro-averaged F-scores, as calculated by comparing system annotations with ground-truth annotations on a test set. Results The systems ranked high among all submitted systems in the competition, with the following F-scores: concept extraction 0.8523 (ranked first); assertion detection 0.9362 (ranked first); relationship detection 0.7313 (ranked second). Conclusion For all tasks, we found that the introduction of a wide range of features was crucial to success. Importantly, our choice of machine learning algorithms allowed us to be versatile in our feature design, and to introduce a large number of features without overfitting and without encountering computing-resource bottlenecks.

A knowledge discovery and reuse pipeline for information extraction in clinical notes

Journal of the American Medical Informatics Association, 2011

Objective Information extraction and classification of clinical data are current challenges in natural language processing. This paper presents a cascaded method to deal with three different extractions and classifications in clinical data: concept annotation, assertion classification and relation classification. Materials and Methods A pipeline system was developed for clinical natural language processing that includes a proofreading process, with gold-standard reflexive validation and correction. The information extraction system is a combination of a machine learning approach and a rule-based approach. The outputs of this system are used for evaluation in all three tiers of the fourth i2b2/VA shared-task and workshop challenge. Results Overall concept classification attained an F-score of 83.3% against a baseline of 77.0%, the optimal F-score for assertions about the concepts was 92.4% and relation classifier attained 72.6% for relationships between clinical concepts against a baseline of 71.0%. Micro-average results for the challenge test set were 81.79%, 91.90% and 70.18%, respectively. Discussion The challenge in the multi-task test requires a distribution of time and work load for each individual task so that the overall performance evaluation on all three tasks would be more informative rather than treating each task assessment as independent. The simplicity of the model developed in this work should be contrasted with the very large feature space of other participants in the challenge who only achieved slightly better performance. There is a need to charge a penalty against the complexity of a model as defined in message minimalisation theory when comparing results. Conclusion A complete pipeline system for constructing language processing models that can be used to process multiple practical detection tasks of language structures of clinical records is presented.

A scoping review of publicly available language tasks in clinical natural language processing

Journal of the American Medical Informatics Association

Objective To provide a scoping review of papers on clinical natural language processing (NLP) shared tasks that use publicly available electronic health record data from a cohort of patients. Materials and Methods We searched 6 databases, including biomedical research and computer science literature databases. A round of title/abstract screening and full-text screening were conducted by 2 reviewers. Our method followed the PRISMA-ScR guidelines. Results A total of 35 papers with 48 clinical NLP tasks met inclusion criteria between 2007 and 2021. We categorized the tasks by the type of NLP problems, including named entity recognition, summarization, and other NLP tasks. Some tasks were introduced as potential clinical decision support applications, such as substance abuse detection, and phenotyping. We summarized the tasks by publication venue and dataset type. Discussion The breadth of clinical NLP tasks continues to grow as the field of NLP evolves with advancements in language sys...

Advancement of natural language programming, machine learning and electronic health records for the digital health science field

International journal of health sciences

The widespread use of “electronic health record systems (EHRs)” in health care provides a large amount of real-world data, opening up new opportunities for medical trials. “Deep learning, a subset of machine learning (ML)”, has experienced a meteoric rise in popularity over the last six years, owing to advances in computing power and the accessibility of enormous new datasets. “Natural language processing (NLP)” approaches have been used as an “artificial intelligence strategy” to obtain information from medical narratives in EHRs, as a huge quantity of useful clinical knowledge is contained in clinical stories. This NLP capacity may enable “automated chart review in clinical care to identify individuals with distinct clinical features” and decrease methodological heterogeneity in establishing “phenotypes, masking biological heterogeneity in allergy, asthma, and immunology research”. Aim of this research paper is to understand the advanced technologies such as “Machine Learning, Nat...