From patient care to research: a validation study examining the factors contributing to data quality in a primary care electronic medical record database (original) (raw)
Related papers
Annals of family medicine, 2014
The Canadian Primary Care Sentinel Surveillance Network (CPCSSN) is Canada's first national chronic disease surveillance system based on electronic health record (EHR) data. The purpose of this study was to develop and validate case definitions and case-finding algorithms used to identify 8 common chronic conditions in primary care: chronic obstructive pulmonary disease (COPD), dementia, depression, diabetes, hypertension, osteoarthritis, parkinsonism, and epilepsy. Using a cross-sectional data validation study design, regional and local CPCSSN networks from British Columbia, Alberta (2), Ontario, Nova Scotia, and Newfoundland participated in validating EHR case-finding algorithms. A random sample of EHR charts were reviewed, oversampling for patients older than 60 years and for those with epilepsy or parkinsonism. Charts were reviewed by trained research assistants and residents who were blinded to the algorithmic diagnosis. Sensitivity, specificity, and positive and negative p...
Systematic reviews, 2017
Primary care electronic medical record (EMR) data are being used for research, surveillance, and clinical monitoring. To broaden the reach and usability of EMR data, case definitions must be specified to identify and characterize important chronic conditions. The purpose of this study is to identify all case definitions for a set of chronic conditions that have been tested and validated in primary care EMR and EMR-linked data. This work will provide a reference list of case definitions, together with their performance metrics, and will identify gaps where new case definitions are needed. We will consider a set of 40 chronic conditions, previously identified as potentially important for surveillance in a review of multimorbidity measures. We will perform a systematic search of the published literature to identify studies that describe case definitions for clinical conditions in EMR data and report the performance of these definitions. We will stratify our search by studies that use E...
electronic Journal of …, 2011
Objectives This paper aims to estimate the reliability of using "principal diagnosis" to identify people with diabetes mellitus (DM), cardiovascular diseases (CVD), and asthma or chronic obstructive pulmonary disease (COPD) in Firstnet, the emergency department (ED) module of the NSW Health Electronic Medical Record (eMR). Methods A list of patients who attended a community hospital ED in 2009 with a specific "principal diagnosis" of DM, CVD, or asthma/COPD, or inferred based on possible keywords, was generated from Firstnet. This Firstnet list was compared with a list extracted from the underlying eMR database tables, using similar specific and possible coded terms. The concordance for an episode of care and for the overall was calculated. Patients on the Firstnet list who were admitted had their discharge summaries audited to confirm the principal diagnosis. The proportion of admitted patients correctly identified as having one of the chronic diseases was calculated. Results The Firstnet list contained 2,559 patients with a principal diagnosis of DM, CVD, or asthma/COPD. The concordance (episode) of the Firstnet list with the eMR list were: 87% of CVD cases, 69% of DM and 38% of asthma/COPD cases. The audit of the discharge summaries of the Firstnet patients who were admitted confirmed the diagnosis of DM, asthma/COPD, and CVD for 79%, 66%, and 56% of the patients respectively. Discussion An empirical method to examine the accuracy of the prinicipal diagnosis in Firstnet is described. The incomplete concordance of diagnoses of the selected chronic diseases generated via different modules of the same information system raises doubts about the reliability of data and information quality collected, stored and used by the eMR. Further research is required to understand the determinants of data quality and develop tools to automate data quality assessment and management. This is particularly important with the increasing use of eMR in routine clinical practice and use of routinely collected clinical data for clinical and research purposes.
A comparison between physicians and computer algorithms for form CMS-2728 data reporting
Introduction: CMS-2728 form (Medical Evidence Report) assesses 23 comorbidities chosen to reflect poor outcomes and increased mortality risk. Previous studies questioned the validity of phy- sician reporting on forms CMS-2728. We hypothesize that reporting of comorbidities by computer algorithms identifies more comorbidities than physician completion, and, therefore, is more reflec- tive of underlying disease burden. Methods: We collected data from CMS-2728 forms for all 296 patients who had incident ESRD diagnosis and received chronic dialysis from 2005 through 2014 at Indiana University outpatient dialysis centers. We analyzed patients’ data from electronic medical records systems that collated information from multiple health care sources. Previously utilized algorithms or natural language processing was used to extract data on 10 comorbidities for a period of up to 10 years prior to ESRD incidence. These algorithms incorporate billing codes, pre- scriptions, and other relevant elements. We compared the presence or unchecked status of these comorbidities on the forms to the presence or absence according to the algorithms. Findings: Computer algorithms had higher reporting of comorbidities compared to forms comple- tion by physicians. This remained true when decreasing data span to one year and using only a single health center source. The algorithms determination was well accepted by a physician panel. Importantly, algorithms use significantly increased the expected deaths and lowered the standar- dized mortality ratios. Discussion: Using computer algorithms showed superior identification of comorbidities for form CMS-2728 and altered standardized mortality ratios. Adapting similar algo- rithms in available EMR systems may offer more thorough evaluation of comorbidities and improve quality reporting.
BMC Research Notes, 2014
Background: In clinical practice, research, and increasingly health surveillance, planning and costing, there is a need for high quality information to determine comorbidity information about patients. Electronic, routinely collected healthcare data is capturing increasing amounts of clinical information as part of routine care. The aim of this study was to assess the validity of routine hospital administrative data to determine comorbidity, as compared with clinician-based case note review, in a large cohort of patients with chronic kidney disease. Methods: A validation study using record linkage. Routine hospital administrative data were compared with clinician-based case note review comorbidity data in a cohort of 3219 patients with chronic kidney disease. To assess agreement, we calculated prevalence, kappa statistic, sensitivity, specificity, positive predictive value and negative predictive value. Subgroup analyses were also performed.
Methods for identifying 30 chronic conditions: application to administrative data
BMC medical informatics and decision making, 2015
Multimorbidity is common and associated with poor clinical outcomes and high health care costs. Administrative data are a promising tool for studying the epidemiology of multimorbidity. Our goal was to derive and apply a new scheme for using administrative data to identify the presence of chronic conditions and multimorbidity. We identified validated algorithms that use ICD-9 CM/ICD-10 data to ascertain the presence or absence of 40 morbidities. Algorithms with both positive predictive value and sensitivity ≥70% were graded as "high validity"; those with positive predictive value ≥70% and sensitivity <70% were graded as "moderate validity". To show proof of concept, we applied identified algorithms with high to moderate validity to inpatient and outpatient claims and utilization data from 574,409 people residing in Edmonton, Canada during the 2008/2009 fiscal year. Of the 40 morbidities, we identified 30 that could be identified with high to moderate validity....
Journal of the American Medical Informatics Association, 2008
To externally validate EPICON, a computerized system for grouping diagnoses from EMRs in general practice into episodes of care. These episodes can be used for estimating morbidity rates. Design: Comparative observational study. Measurements: Morbidity rates from an independent dataset, based on episode-oriented EMRs, were used as the gold standard. The EMRs in this dataset contained diagnoses which were manually grouped by GPs. The authors ungrouped these diagnoses and regrouped them automatically into episodes using EPICON. The authors then used these episodes to estimate morbidity rates that were compared to the gold standard. The differences between the two sets of morbidity rates were calculated and the authors analyzed large as well as structural differences to establish possible causes. Results: In general, the morbidity rates based on EPICON deviate only slightly from the gold standard. Out of 675 diagnoses, 36 (5%) were considered to be deviating diagnoses. The deviating diagnoses showed differences for two main reasons: "differences in rules between the two methods of episode construction" and "inadequate performance of EPICON." Conclusion: The EPICON system performs well for the large majority of the morbidity rates. We can therefore conclude that EPICON is useful for grouping episodes to estimate morbidity rates using EMRs from general practices. Morbidity rates of diseases with a broad range of symptoms should, however, be interpreted cautiously.
Clinical Epidemiology, 2020
Objective: Electronic health records (EHR) data-discontinuity, i.e. receiving care outside of a particular EHR system, may cause misclassification of study variables. We aimed to validate an algorithm to identify patients with high EHR data-continuity to reduce such bias. Materials and Methods: We analyzed data from two EHR systems linked with Medicare claims data from 2007 through 2014, one in Massachusetts (MA, n=80,588) and the other in North Carolina (NC, n=33,207). We quantified EHR data-continuity by Mean Proportion of Encounters Captured (MPEC) by the EHR system when compared to complete recording in claims data. The prediction model for MPEC was developed in MA and validated in NC. Stratified by predicted EHR data-continuity, we quantified misclassification of 40 key variables by Mean Standardized Differences (MSD) between the proportions of these variables based on EHR alone vs the linked claims-EHR data. Results: The mean MPEC was 27% in the MA and 26% in the NC system. The predicted and observed EHR data-continuity was highly correlated (Spearman correlation=0.78 and 0.73, respectively). The misclassification (MSD) of 40 variables in patients of the predicted EHR data-continuity cohort was significantly smaller (44%, 95% CI: 40-48%) than that in the remaining population. Discussion: The comorbidity profiles were similar in patients with high vs low EHR datacontinuity. Therefore, restricting an analysis to patients with high EHR data-continuity may reduce information bias while preserving the representativeness of the study cohort. Conclusion: We have successfully validated an algorithm that can identify a high EHR data-continuity cohort representative of the source population.
Validation of diagnostic codes within medical services claims
Journal of Clinical Epidemiology, 2004
Objectives: Few studies have attempted to validate the diagnostic information contained within medical service claims data, and only a small proportion of these have attempted to do so using the medical chart as a gold standard. The goal of this study is to determine the sensitivity and specificity of medical services claims diagnoses for surveillance of 14 drug disease contraindications used in drug utilization review, the Charlson comorbidity index and the Johns Hopkins Adjusted Care Group Case-Mix profile (ADGs).
BMC Health Services Research
Background: Adverse events (AEs) in acute care hospitals are frequent and associated with significant morbidity, mortality, and costs. Measuring AEs is necessary for quality improvement and benchmarking purposes, but current detection methods lack in accuracy, efficiency, and generalizability. The growing availability of electronic health records (EHR) and the development of natural language processing techniques for encoding narrative data offer an opportunity to develop potentially better methods. The purpose of this study is to determine the accuracy and generalizability of using automated methods for detecting three high-incidence and high-impact AEs from EHR data: a) hospital-acquired pneumonia, b) ventilator-associated event and, c) central line-associated bloodstream infection. Methods: This validation study will be conducted among medical, surgical and ICU patients admitted between 2013 and 2016 to the Centre hospitalier universitaire de Sherbrooke (CHUS) and the McGill University Health Centre (MUHC), which has both French and English sites. A random 60% sample of CHUS patients will be used for model development purposes (cohort 1, development set). Using a random sample of these patients, a reference standard assessment of their medical chart will be performed. Multivariate logistic regression and the area under the curve (AUC) will be employed to iteratively develop and optimize three automated AE detection models (i.e., one per AE of interest) using EHR data from the CHUS. These models will then be validated on a random sample of the remaining 40% of CHUS patients (cohort 1, internal validation set) using chart review to assess accuracy. The most accurate models developed and validated at the CHUS will then be applied to EHR data from a random sample of patients admitted to the MUHC French site (cohort 2) and English site (cohort 3)-a critical requirement given the use of narrative data-, and accuracy will be assessed using chart review. Generalizability will be determined by comparing AUCs from cohorts 2 and 3 to those from cohort 1.