Ascertaining Framingham heart failure phenotype from inpatient electronic health record data using natural language processing: a multicentre Atherosclerosis Risk in Communities (ARIC) validation study (original) (raw)
Related papers
Identification of Congestive Heart Failure Patients Through Natural Language Processing
Transactions on Computer Systems and Networks, 2021
Research in biomedical field requires technical infrastructures to deal with heterogeneous and multi-sourced big data. Biomedical informatics primarily use data in electronic health to better understand how diseases spread or to gather new insights from patient history. One of the prominent use cases of electronic health records is identification of patient cohort (group) with specific disease or some common characteristics, so that useful inferences may be drawn via these records. This paper proposes a methodology for identification and analysis of cohorts for patients having congestive heart failure problem among obesity patients. This may help doctors and medical researchers in predicting outcomes, survival analysis of patients, clinical trials, and other types of retroactive studies. cTAKES tool was used to apply natural language processing technique in order to identify patients belonging to a particular cohort. All clinical terms were identified and were mapped to its matching terms in the UMLS Metathesaurus. Also, negated statements were detected and removed from the final cohort. The method is reasonably automated and achieves accuracy, precision, recall, and F-score values of 0.970, 0.972, 0.958, and 0.965, respectively. Results were compared against the experts annotations. Additionally, manual review of clinical records was performed for further validation.
In this paper, we proposed two different approaches, a rule-based approach and a machine-learning based approach, to identify active heart failure cases automatically by analyzing electronic health records (EHR). For the rule-based approach, we extracted cardiovascular data elements from clinical notes and matched patients to different colors according their heart failure condition by using rules provided by experts in heart failure. It achieved 69.4% accuracy and 0.729 F1-Score. For the machine learning approach, with bigram of clinical notes as features, we tried four different models while SVM with linear kernel achieved the best performance with 87.5% accuracy and 0.86 F1-Score. Also, from the classification comparison between the four different models, we believe that linear models fit better for this problem. Once we combine the machine-learning and rule-based algorithms, we will enable hospital-wide surveillance of active heart failure through increased accuracy and interpretability of the outputs.
Clinical Cardiology, 2021
Objective: Accurate ascertainment of comorbidities is paramount in clinical research. While manual adjudication is labor-intensive and expensive, the adoption of electronic health records enables computational analysis of free-text documentation using natural language processing (NLP) tools. Hypothesis: We sought to develop highly accurate NLP modules to assess for the presence of five key cardiovascular comorbidities in a large electronic health record system. Methods: One-thousand clinical notes were randomly selected from a cardiovascular registry at Mass General Brigham. Trained physicians manually adjudicated these notes for the following five diagnostic comorbidities: hypertension, dyslipidemia, diabetes, coronary artery disease, and stroke/transient ischemic attack. Using the opensource Canary NLP system, five separate NLP modules were designed based on 800 "training-set" notes and validated on 200 "test-set" notes. Results: Across the five NLP modules, the sentence-level and note-level sensitivity, specificity, and positive predictive value was always greater than 85% and was most often greater than 90%. Accuracy tended to be highest for conditions with greater diagnostic clarity (e.g. diabetes and hypertension) and slightly lower for conditions whose greater diagnostic challenges (e.g. myocardial infarction and embolic stroke) may lead to less definitive documentation. Conclusion: We designed five open-source and highly accurate NLP modules that can be used to assess for the presence of important cardiovascular comorbidities in free-text health records. These modules have been placed in the public domain and can be used for clinical research, trial recruitment and population management at any Alexander Turchin and Ron Blankstein contributed equally to this study.
JMIR medical informatics, 2018
Background: We developed an accurate, stakeholder-informed, automated, natural language processing (NLP) system to measure the quality of heart failure (HF) inpatient care, and explored the potential for adoption of this system within an integrated health care system. Objective: To accurately automate a United States Department of Veterans Affairs (VA) quality measure for inpatients with HF. Methods: We automated the HF quality measure Congestive Heart Failure Inpatient Measure 19 (CHI19) that identifies whether a given patient has left ventricular ejection fraction (LVEF) <40%, and if so, whether an angiotensin-converting enzyme inhibitor or angiotensin-receptor blocker was prescribed at discharge if there were no contraindications. We used documents from 1083 unique inpatients from eight VA medical centers to develop a reference standard (RS) to train (n=314) and test (n=769) the Congestive Heart Failure Information Extraction Framework (CHIEF). We also conducted semi-structured interviews (n=15) for stakeholder feedback on implementation of the CHIEF.
Summit on translational bioinformatics, 2008
Informatics tools to extract and analyze clinical information on patients have lagged behind data-mining developments in bioinformatics. While the analyses of an individual's partial or complete genotype is nearly a reality, the phenotypic characteristics that accompany the genotype are not well known and largely inaccessible in free-text patient health records. As the adoption of electronic medical records increases, there exists an urgent need to extract pertinent phenotypic information and make that available to clinicians and researchers. This usually requires the data to be in a structured format that is both searchable and amenable to computation. Using inflammatory bowel disease as an example, this study demonstrates the utility of a natural language processing system (MedLEE) in mining clinical notes in the paperless VA Health Care System. This adaptation of MedLEE is useful for identifying patients with specific clinical conditions, those at risk for or those with sympt...
Journal of Clinical Medicine
Patients with Type 2 diabetes mellitus (T2DM) and coronary artery disease (CAD) are at high risk of developing major adverse cardiovascular events (MACE). This is a multicenter, retrospective, and observational study performed in Spain aimed to characterize these patients in a real-world setting. Unstructured data from the Electronic Health Records were extracted by EHRead®, a technology based on Natural Language Processing and machine learning. The association between new MACE and the variables of interest were investigated by univariable and multivariable analyses. From a source population of 2,184,662 patients, we identified 4072 adults diagnosed with T2DM and CAD (62.2% male, mean age 70 ± 11). The main comorbidities observed included arterial hypertension, hyperlipidemia, and obesity, with metformin and statins being the treatments most frequently prescribed. MACE development was associated with multivessel (Hazard Ratio (HR) = 2.49) and single coronary vessel disease (HR = 1.7...
2015
Narrative information in Electronic Health Records (EHRs) and literature articles contains a wealth of clinical information about treatment, diagnosis, medication and family history. This often includes detailed phenotype information for specific diseases, which in turn can help to identify risk factors and thus determine the susceptibility of different patients. Such information can help to improve healthcare applications, including Clinical Decision Support Systems (CDS). Clinical text mining (TM) tools can provide efficient automated means to extract and integrate vital information hidden within the vast volumes of available text. Development or adaptation of TM tools is reliant on the availability of annotated training corpora, although few such corpora exist for the clinical domain. In response, we have created a new annotated corpus (PhenoCHF), focussing on the identification of phenotype information for a specific clinical sub-domain, i.e., congestive heart failure (CHF). The...
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 2006
We have developed a natural language processing system for extracting and coding clinical data from free text reports. The system is designed to be easily modified and adapted to a variety of free text clinical reports such as admission notes, radiology and pathology reports, and discharge summaries. This report presents the results of this system to extract and code clinical concepts related to congestive heart failure from 39,000 chest radiology reports. The system detects the presence or absence of six concepts: congestive heart failure, Kerley B lines, cardiomegaly, prominent pulmonary vasculature, pulmonary edema, and pleural effusion. We compared it's output to a gold standard which consisted of specially trained human coders as well as an experienced physician. Results indicate that the system had high specificity, recall and precision for each of the concepts it is designed to detect.
Discovering and identifying New York heart association classification from electronic health records
BMC medical informatics and decision making, 2018
Cardiac Resynchronization Therapy (CRT) is an established pacing therapy for heart failure patients. The New York Heart Association (NYHA) class is often used as a measure of a patient's response to CRT. Identifying NYHA class for heart failure (HF) patients in an electronic health record (EHR) consistently, over time, can provide better understanding of the progression of heart failure and assessment of CRT response and effectiveness. Though NYHA is rarely stored in EHR structured data, such information is often documented in unstructured clinical notes. We accessed HF patients' data in a local EHR system and identified potential sources of NYHA, including local diagnosis codes, procedures, and clinical notes. We further investigated and compared the performances of rule-based versus machine learning-based natural language processing (NLP) methods to identify NYHA class from clinical notes. Of the 36,276 patients with a diagnosis of HF or a CRT implant, 19.2% had NYHA class...