A natural language processing approach for identifying temporal disease onset information from mental healthcare text (original) (raw)
Related papers
Schizophrenia Bulletin, 2020
Background Using novel data mining methods such as natural language processing (NLP) on electronic health records (EHRs) for screening and detecting individuals at risk for psychosis. Method The study included all patients receiving a first index diagnosis of nonorganic and nonpsychotic mental disorder within the South London and Maudsley (SLaM) NHS Foundation Trust between January 1, 2008, and July 28, 2018. Least Absolute Shrinkage and Selection Operator (LASSO)-regularized Cox regression was used to refine and externally validate a refined version of a five-item individualized, transdiagnostic, clinically based risk calculator previously developed (Harrell’s C = 0.79) and piloted for implementation. The refined version included 14 additional NLP-predictors: tearfulness, poor appetite, weight loss, insomnia, cannabis, cocaine, guilt, irritability, delusions, hopelessness, disturbed sleep, poor insight, agitation, and paranoia. Results A total of 92 151 patients with a first index ...
Natural Language Processing-Based Quantication of the Mental State of Psychiatric Patients
Computational Psychiatry
Psychiatric practice routinely uses semistructured and/or unstructured free text to record the behavior and mental state of patients. Many of these data are unstructured, lack standardization, and are difficult to use for analysis. Thus, it is difficult to quantitatively analyze a patient's illness trajectory over time and his or her responsiveness to treatment, and it is also difficult to compare different patients quantitatively. In this article, experts in the field of psychiatry, along with machine learning models, have collaboratively transformed patient data available in status assessments generated by physicians into binary vector representations. Data from patients with mental health disorders collected within a real-world clinical setting from one of the largest behavioral electronic health record (EHR) systems in the United States have been used for generating these representations. The binary vector representation of these health records is shown to be useful in various clinical tasks, such as disease phenotyping, characterizing the suicidality of patients, and inferring diagnoses. To summarize, this approach can transform semistructured free-text summaries of patients' status assessments into a structured, quantifiable format, which enriches the data that reside within EHR systems. This allows for effective intra-and interpatient quantifications and comparisons, which are much needed in the field of mental health. With the aid of these binary representations, patients' mental states can be systematically tracked over time, as can their responses to medications at the individual and population levels. a n o p e n a c c e s s j o u r n a l
BMC Psychiatry
Background Developing predictive models for precision psychiatry is challenging because of unavailability of the necessary data: extracting useful information from existing electronic health record (EHR) data is not straightforward, and available clinical trial datasets are often not representative for heterogeneous patient groups. The aim of this study was constructing a natural language processing (NLP) pipeline that extracts variables for building predictive models from EHRs. We specifically tailor the pipeline for extracting information on outcomes of psychiatry treatment trajectories, applicable throughout the entire spectrum of mental health disorders (“transdiagnostic”). Methods A qualitative study into beliefs of clinical staff on measuring treatment outcomes was conducted to construct a candidate list of variables to extract from the EHR. To investigate if the proposed variables are suitable for measuring treatment effects, resulting themes were compared to transdiagnostic ...
Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
As of May 15th, 2022, the novel coronavirus SARS-COV-2 has infected 517 million people and resulted in more than 6.2 million deaths around the world. About 40% to 87% of patients suffer from persistent symptoms weeks or months after their original infection. Despite remarkable progress in preventing and treating acute COVID-19 conditions, the clinical diagnosis of long-term COVID remains difficult. In this work, we use free-text clinical notes and natural language processing (NLP) techniques to explore long-term COVID effects. We first obtain free-text clinical notes from 719 outpatient encounters representing patients treated by physicians at Emory Clinic to detect patterns in patients with long-term COVID symptoms. We apply state-of-the-art NLP frameworks to automatically identify patients with long-term COVID effects, achieving 0.881 recall (sensitivity) score for note-level prediction. We further interpret the prediction outcomes and discuss potential phenotypes. Our work aims to provide a data-driven solution to identify patients who have developed persistent symptoms after acute COVID infection. With this work, clinicians may be able to identify patients who have long-term COVID symptoms to optimize treatment.
Natural language processing in mental health applications using non-clinical texts
Natural language processing (NLP) techniques can be used to make inferences about peoples' mental states from what they write on Facebook, Twitter and other social media. These inferences can then be used to create online pathways to direct people to health information and assistance and also to generate personalized interventions. Regrettably, the computational methods used to collect, process and utilize online writing data, as well as the evaluations of these techniques, are still dispersed in the literature. This paper provides a taxonomy of data sources and techniques that have been used for mental health support and intervention. Specifically, we review how social media and other data sources have been used to detect emotions and identify people who may be in need of psychological assistance; the computational techniques used in labeling and diagnosis; and finally, we discuss ways to generate and personalize mental health interventions. The overarching aim of this scoping review is to highlight areas of research where NLP has been applied in the mental health literature and to help develop a common language that draws together the fields of mental health, human-computer interaction and NLP.
Generating Positive Psychosis Symptom Keywords from Electronic Health Records
Artificial Intelligence in Medicine, 2019
The development of Natural Language Processing (NLP) solutions for information extraction from electronic health records (EHRs) has grown in recent years, as most clinically relevant information in EHRs is documented only in free text. One of the core tasks for any NLP system is to extract clinically relevant concepts such as symptoms. This information can then be used for more complex problems such as determining symptom onset, which requires temporal information. In the mental health domain, comprehensive vocabularies for specific disorders are scarce, and rarely contain keywords that reflect real-world terminology use. We explore the use of embedding techniques to automatically generate lexical variants of psychosis symptoms into vocabularies, that can be used in complex downstream NLP tasks. We study the impact of the underlying text material on generating useful lexical entries, experimenting with different corpora and with unigram/bigram models. We also propose a method to automatically compute thresholds for choosing the most relevant terms. Our main contribution is a systematic study of unsupervised vocabulary generation using different corpora for an understudied clinical use-case. Resulting lexicons are publicly available.
Natural language processing and modeling of clinical disease trajectories across brain disorders
Brain disorders, including neurodegenerative diseases and mental illnesses, are often difficult to diagnose and study due to clinical and pathological heterogeneity, overlap in clinical manifestations between disorders, and frequent comorbidities, hampering drug development and fundamental research. Hence, there is a clear need for data-driven approaches to disentangle these complex disorders. Here, we established a computational pipeline to process clinical summaries from donors with a wide range of brain disorders that were neuropathologically diagnosed by the Netherlands Brain Bank. First, we identified and defined 90 cross-disorder signs and symptoms within cognitive, motor, sensory, psychiatric, and general domains. Second, we trained and optimized natural language processing (NLP) models to identify these signs and symptoms in individual sentences of the extensive clinical summaries from donors of the NBB, resulting in temporal disease trajectories. Third, we studied the tempo...
Journal of Medical Internet Research, 2020
Background A novel disease poses special challenges for informatics solutions. Biomedical informatics relies for the most part on structured data, which require a preexisting data or knowledge model; however, novel diseases do not have preexisting knowledge models. In an emergent epidemic, language processing can enable rapid conversion of unstructured text to a novel knowledge model. However, although this idea has often been suggested, no opportunity has arisen to actually test it in real time. The current coronavirus disease (COVID-19) pandemic presents such an opportunity. Objective The aim of this study was to evaluate the added value of information from clinical text in response to emergent diseases using natural language processing (NLP). Methods We explored the effects of long-term treatment by calcium channel blockers on the outcomes of COVID-19 infection in patients with high blood pressure during in-patient hospital stays using two sources of information: data available s...
Unlocking the Power of Clinical Notes: Natural Language Processing in Healthcare
Acta Scientific Medical Sciences, 2024
Electronic Health Records (EHRs) have become the backbone of modern healthcare, providing a comprehensive record of a patient's medical journey. However, a significant portion of this data resides in clinical notes, predominantly consist of unstructured text. While valuable for consumption by medical professionals, this format presents challenges for traditional data analysis methods. Natural Language Processing (NLP) offers a powerful solution to structure the information presented and unlock the potential of clinical notes. This paper explores the application of NLP tasks within the healthcare domain, specifically focusing on EHR data. We delve into the NLP pipeline, which allows us to differentiate between essential upstream tasks like tokenization and downstream tasks like named entity recognition (NER) and relation extraction. We showcase how NLP can extract crucial clinical information through these tasks and also emphasize the importance of de-identification for maintaining patient privacy. A major challenge in NLP for healthcare is the limited availability of labeled clinical data. We discuss this bottleneck and explore potential solutions like active learning and transfer learning. Finally, the paper highlights the transformative potential of NLP in healthcare data processing and paves the way for future advancements in this dynamic field.