Scalable and accurate deep learning with electronic health records - PubMed (original) (raw)

doi: 10.1038/s41746-018-0029-1. eCollection 2018.

Eyal Oren # 1, Kai Chen 1, Andrew M Dai 1, Nissan Hajaj 1, Michaela Hardt 1, Peter J Liu 1, Xiaobing Liu 1, Jake Marcus 1, Mimi Sun 1, Patrik Sundberg 1, Hector Yee 1, Kun Zhang 1, Yi Zhang 1, Gerardo Flores 1, Gavin E Duggan 1, Jamie Irvine 1, Quoc Le 1, Kurt Litsch 1, Alexander Mossin 1, Justin Tansuwan 1, De Wang 1, James Wexler 1, Jimbo Wilson 1, Dana Ludwig 2, Samuel L Volchenboum 3, Katherine Chou 1, Michael Pearson 1, Srinivasan Madabushi 1, Nigam H Shah 4, Atul J Butte 2, Michael D Howell 1, Claire Cui 1, Greg S Corrado 1, Jeffrey Dean 1

Affiliations

PMID: 31304302
PMCID: PMC6550175
DOI: 10.1038/s41746-018-0029-1

Scalable and accurate deep learning with electronic health records

Alvin Rajkomar et al. NPJ Digit Med. 2018.

Abstract

Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of patients' entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient's final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the patient's chart.

Keywords: Machine learning; Medical research.

PubMed Disclaimer

Conflict of interest statement

Competing interestsThe authors declare no competing interests.

Figures

Fig. 1

This boxplot displays the amount of data (on a log scale) in the EHR, along with its temporal variation across the course of an admission. We define a token as a single data element in the electronic health record, like a medication name, at a specific point in time. Each token is considered as a potential predictor by the deep learning model. The line within the boxplot represents the median, the box represents the interquartile range (IQR), and the whiskers are 1.5 times the IQR. The number of tokens increased steadily from admission to discharge. At discharge, the median number of tokens for Hospital A was 86,477 and for Hospital B was 122,961

Fig. 2

The area under the receiver operating characteristic curves are shown for predictions of inpatient mortality made by deep learning and baseline models at 12 h increments before and after hospital admission. For inpatient mortality, the deep learning model achieves higher discrimination at every prediction time compared to the baseline for both the University of California, San Francisco (UCSF) and University of Chicago Medicine (UCM) cohorts. Both models improve in the first 24 h, but the deep learning model achieves a similar level of accuracy approximately 24 h earlier for UCM and even 48 h earlier for UCSF. The error bars represent the bootstrapped 95% confidence interval

Fig. 3

The patient record shows a woman with metastatic breast cancer with malignant pleural effusions and empyema. The patient timeline at the top of the figure contains circles for every time-step for which at least a single token exists for the patient, and the horizontal lines show the data type. There is a close-up view of the most recent data points immediately preceding a prediction made 24 h after admission. We trained models for each data type and highlighted in red the tokens which the models attended to—the non-highlighted text was not attended to but is shown for context. The models pick up features in the medications, nursing flowsheets, and clinical notes relevant to the prediction

Fig. 4

Data from each health system were mapped to an appropriate FHIR (Fast Healthcare Interoperability Resources) resource and placed in temporal order. This conversion did not harmonize or standardize the data from each health system other than map them to the appropriate resource. The deep learning model could use all data available prior to the point when the prediction was made. Therefore, each prediction, regardless of the task, used the same data

Cited by

Survival models and longitudinal medical events for hospital readmission forecasting.
Davis S, Greiner R. Davis S, et al. BMC Health Serv Res. 2024 Nov 13;24(1):1394. doi: 10.1186/s12913-024-11771-w. BMC Health Serv Res. 2024. PMID: 39538197 Free PMC article.
Multisource representation learning for pediatric knowledge extraction from electronic health records.
Li M, Li X, Pan K, Geva A, Yang D, Sweet SM, Bonzel CL, Ayakulangara Panickan V, Xiong X, Mandl K, Cai T. Li M, et al. NPJ Digit Med. 2024 Nov 13;7(1):319. doi: 10.1038/s41746-024-01320-4. NPJ Digit Med. 2024. PMID: 39533050 Free PMC article.
Early prediction of hypertensive disorders of pregnancy toward preventive early intervention.
Mizuno S, Nagaie S, Sugawara J, Tamiya G, Obara T, Ishikuro M, Kuriyama S, Yaegashi N, Tanaka H, Yamamoto M, Ogishima S. Mizuno S, et al. AJOG Glob Rep. 2024 Jul 27;4(4):100383. doi: 10.1016/j.xagr.2024.100383. eCollection 2024 Nov. AJOG Glob Rep. 2024. PMID: 39524694 Free PMC article.
Ten challenges and opportunities in computational immuno-oncology.
Bao R, Hutson A, Madabhushi A, Jonsson VD, Rosario SR, Barnholtz-Sloan JS, Fertig EJ, Marathe H, Harris L, Altreuter J, Chen Q, Dignam J, Gentles AJ, Gonzalez-Kozlova E, Gnjatic S, Kim E, Long M, Morgan M, Ruppin E, Valen DV, Zhang H, Vokes N, Meerzaman D, Liu S, Van Allen EM, Xing Y. Bao R, et al. J Immunother Cancer. 2024 Oct 26;12(10):e009721. doi: 10.1136/jitc-2024-009721. J Immunother Cancer. 2024. PMID: 39461879 Free PMC article. Review.
Applicability of the adjusted morbidity groups algorithm for healthcare programming: results of a pilot study in Italy.
Papa R, Balducci F, Franceschini G, Pompili M, De Marco M, Roca J, González-Colom R, Monterde D. Papa R, et al. BMC Public Health. 2024 Oct 17;24(1):2869. doi: 10.1186/s12889-024-20398-9. BMC Public Health. 2024. PMID: 39420326 Free PMC article.

References

1. The Digital Universe: Driving Data Growth in Healthcare. Available at: https://www.emc.com/analyst-report/digital-universe-healthcare-vertical-... (Accessed 23 Feb 2017).
1. Parikh RB, Schwartz JS, Navathe AS. Beyond genes and molecules - a precision delivery initiative for precision medicine. N. Engl. J. Med. 2017;376:1609–1612. doi: 10.1056/NEJMp1613224. - DOI - PubMed
1. Parikh RB, Kakad M, Bates DW. Integrating predictive analytics into high-value care: the dawn of precision delivery. JAMA. 2016;315:651–652. doi: 10.1001/jama.2015.19417. - DOI - PubMed
1. Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff. 2014;33:1123–1131. doi: 10.1377/hlthaff.2014.0041. - DOI - PubMed
1. Krumholz HM. Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system. Health Aff. 2014;33:1163–1170. doi: 10.1377/hlthaff.2014.0053. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect
- The Lens - Patent Citations Database