Extracting and integrating data from entire electronic health records for detecting colorectal cancer cases - PubMed (original) (raw)
Comparative Study
Epub 2011 Oct 22.
Affiliations
- PMID: 22195222
- PMCID: PMC3243156
Comparative Study
Extracting and integrating data from entire electronic health records for detecting colorectal cancer cases
Hua Xu et al. AMIA Annu Symp Proc. 2011.
Abstract
Identification of a cohort of patients with specific diseases is an important step for clinical research that is based on electronic health records (EHRs). Informatics approaches combining structured EHR data, such as billing records, with narrative text data have demonstrated utility for such tasks. This paper describes an algorithm combining machine learning and natural language processing to detect patients with colorectal cancer (CRC) from entire EHRs at Vanderbilt University Hospital. We developed a general case detection method that consists of two steps: 1) extraction of positive CRC concepts from all clinical notes (document-level concept identification); and 2) determination of CRC cases using aggregated information from both clinical narratives and structured billing data (patient-level case determination). For each step, we compared performance of rule-based and machine-learning-based approaches. Using a manually reviewed data set containing 300 possible CRC patients (150 for training and 150 for testing), we showed that our method achieved F-measures of 0.996 for document level concept identification, and 0.93 for patient level case detection.
Figures
Figure 1.
Overview of the 2-step case detection method from EHR.
Figure 2.
The workflow of creation of the gold standard data set containing 300 patients with labels indicating their CRC status (Yes or No).
Figure 3.
Distribution of CRC concepts among different types of notes, including DS – discharge summaries, CC – clinical communications, FORM – clinical forms, RAD – radiology notes, PATH – pathology notes, PL – patient summary lists, and OTHER – Other clinical notes (History & Physicals, clinic notes, progress notes).
References
- Sager N, et al. The analysis and processing of clinical narrative. MedInfo. 1986
- Sager N, Friedman C, Lyman M. Medical language processing: computer management of narrative data. Reading, MA: Addison-Wesley; 1987.
- Hripcsak G, et al. Unlocking clinical data from narrative reports: a study of natural language processing. Ann Intern Med. 1995;122(9):681–8. -PubMed
Publication types
MeSH terms
Grants and funding
- R01 CA141307/CA/NCI NIH HHS/United States
- LM008635/LM/NLM NIH HHS/United States
- R01 LM008635/LM/NLM NIH HHS/United States
- 1UL1RR024975-01/RR/NCRR NIH HHS/United States
- UL1 RR024975/RR/NCRR NIH HHS/United States
- R01CA141307/CA/NCI NIH HHS/United States
- R01 LM010016/LM/NLM NIH HHS/United States
- LM010016/LM/NLM NIH HHS/United States
LinkOut - more resources
Full Text Sources
Medical