Semi-supervised learning of the electronic health record for phenotype stratification - PubMed (original) (raw)
Semi-supervised learning of the electronic health record for phenotype stratification
Brett K Beaulieu-Jones et al. J Biomed Inform. 2016 Dec.
Free article
Abstract
Patient interactions with health care providers result in entries to electronic health records (EHRs). EHRs were built for clinical and billing purposes but contain many data points about an individual. Mining these records provides opportunities to extract electronic phenotypes, which can be paired with genetic data to identify genes underlying common human diseases. This task remains challenging: high quality phenotyping is costly and requires physician review; many fields in the records are sparsely filled; and our definitions of diseases are continuing to improve over time. Here we develop and evaluate a semi-supervised learning method for EHR phenotype extraction using denoising autoencoders for phenotype stratification. By combining denoising autoencoders with random forests we find classification improvements across multiple simulation models and improved survival prediction in ALS clinical trial data. This is particularly evident in cases where only a small number of patients have high quality phenotypes, a common scenario in EHR-based research. Denoising autoencoders perform dimensionality reduction enabling visualization and clustering for the discovery of new subtypes of disease. This method represents a promising approach to clarify disease subtypes and improve genotype-phenotype association studies that leverage EHRs.
Keywords: Denoising autoencoder; Disease subtyping; Electronic health record; Electronic phenotyping; Patient stratification; Unsupervised.
Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.
Similar articles
- Weakly Semi-supervised phenotyping using Electronic Health records.
Nogues IE, Wen J, Lin Y, Liu M, Tedeschi SK, Geva A, Cai T, Hong C. Nogues IE, et al. J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5. J Biomed Inform. 2022. PMID: 36064111 Free PMC article. - Automated feature selection of predictors in electronic medical records data.
Gronsbell J, Minnier J, Yu S, Liao K, Cai T. Gronsbell J, et al. Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2. Biometrics. 2019. PMID: 30353541 - Mapping Patient Trajectories using Longitudinal Extraction and Deep Learning in the MIMIC-III Critical Care Database.
Beaulieu-Jones BK, Orzechowski P, Moore JH. Beaulieu-Jones BK, et al. Pac Symp Biocomput. 2018;23:123-132. Pac Symp Biocomput. 2018. PMID: 29218875 - The use of electronic health records for psychiatric phenotyping and genomics.
Smoller JW. Smoller JW. Am J Med Genet B Neuropsychiatr Genet. 2018 Oct;177(7):601-612. doi: 10.1002/ajmg.b.32548. Epub 2017 May 30. Am J Med Genet B Neuropsychiatr Genet. 2018. PMID: 28557243 Free PMC article. Review. - Natural Language Processing for EHR-Based Computational Phenotyping.
Zeng Z, Deng Y, Li X, Naumann T, Luo Y. Zeng Z, et al. IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):139-153. doi: 10.1109/TCBB.2018.2849968. Epub 2018 Jun 25. IEEE/ACM Trans Comput Biol Bioinform. 2019. PMID: 29994486 Free PMC article. Review.
Cited by
- Towards automated phenotype definition extraction using large language models.
Tekumalla R, Banda JM. Tekumalla R, et al. Genomics Inform. 2024 Oct 31;22(1):21. doi: 10.1186/s44342-024-00023-2. Genomics Inform. 2024. PMID: 39482749 Free PMC article. - Machine learning and brain-computer interface approaches in prognosis and individualized care strategies for individuals with amyotrophic lateral sclerosis: A systematic review.
Kew SYN, Mok SY, Goh CH. Kew SYN, et al. MethodsX. 2024 May 25;13:102765. doi: 10.1016/j.mex.2024.102765. eCollection 2024 Dec. MethodsX. 2024. PMID: 39286440 Free PMC article. Review. - PATIENT RECRUITMENT USING ELECTRONIC HEALTH RECORDS UNDER SELECTION BIAS: A TWO-PHASE SAMPLING FRAMEWORK.
Zhang G, Beesley LJ, Mukherjee B, Shi XU. Zhang G, et al. Ann Appl Stat. 2024 Sep;18(3):1858-1878. doi: 10.1214/23-aoas1860. Epub 2024 Aug 5. Ann Appl Stat. 2024. PMID: 39149424 Free PMC article. - Generating Complex Explanations for Artificial Intelligence Models: An Application to Clinical Data on Severe Mental Illness.
Banerjee S. Banerjee S. Life (Basel). 2024 Jun 26;14(7):807. doi: 10.3390/life14070807. Life (Basel). 2024. PMID: 39063562 Free PMC article. - LATTE: Label-efficient incident phenotyping from longitudinal electronic health records.
Wen J, Hou J, Bonzel CL, Zhao Y, Castro VM, Gainer VS, Weisenfeld D, Cai T, Ho YL, Panickan VA, Costa L, Hong C, Gaziano JM, Liao KP, Lu J, Cho K, Cai T. Wen J, et al. Patterns (N Y). 2023 Dec 27;5(1):100906. doi: 10.1016/j.patter.2023.100906. eCollection 2024 Jan 12. Patterns (N Y). 2023. PMID: 38264714 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous