Mining electronic health records: towards better research applications and clinical care (original) (raw)
Stewart, W. F., Shah, N. R., Selna, M. J., Paulus, R. A. & Walker, J. M. Bridging the inferential gap: the electronic health record and clinical evidence. Health Aff.26, w181–w191 (2007). Article Google Scholar
Hillestad, R. et al. Can electronic medical record systems transform health care? Potential health benefits, savings, and costs. Health Aff.24, 1103–1117 (2005). Article Google Scholar
Prokosch, H.-U. & Ganslandt, T. Perspectives for medical informatics. Reusing the electronic medical record for clinical research. Methods Inf. Med.1, 38–44 (2009). Google Scholar
Kohane, I. S. Using electronic health records to drive discovery in disease genomics. Nature Rev. Genet.12, 417–428 (2011). ArticleCASPubMed Google Scholar
Kush, R. D., Helton, E., Rockhold, F. W. & Hardison, C. D. Electronic health records, medical research, and the Tower of Babel. N. Eng. J. Med.358, 1738–1740 (2008). ArticleCAS Google Scholar
Himmelstein, D. U., Wright, A. & Woolhandler, S. Hospital computing and the costs and quality of care: a national study. Am. J. Med.123, 40–46 (2010). ArticlePubMed Google Scholar
Buntin, M. B., Burke, M. F., Hoaglin, M. C. & Blumenthal, D. The benefits of health information technology: a review of the recent literature shows predominantly positive results. Health Aff.30, 464–471 (2011). Article Google Scholar
Hunter, J. The Innovative Medicines Initiative: a pre-competitive initiative to enhance the biomedical science base of Europe to expedite the development of new medicines for patients. Drug Discov. Today13, 371–373 (2008). ArticlePubMed Google Scholar
Morrison, Z., Robertson, A., Cresswell, K., Crowe, S. & Sheikh, A. Understanding contrasting approaches to nationwide implementations of electronic health record systems: England, the USA and Australia. J. Healthc. Engin.2, 25–41 (2010). ArticleCAS Google Scholar
Jha, A. K., DesRoches, C. M., Kralovec, P. D. & Joshi, M. S. A progress report on electronic health records in US hospitals. Health Aff.29, 1951–1957 (2010). Article Google Scholar
Serdén, L., Lindqvist, R. & Rosén, M. Have DRG-based prospective payment systems influenced the number of secondary diagnoses in health care administrative data? Health Policy65, 101–107 (2003). ArticlePubMed Google Scholar
Thygesen, L. C., Daasnes, C., Thaulow, I. & Bronnum-Hansen, H. Introduction to Danish (nationwide) registers on health and social issues: structure, access, legislation, and archiving. Scand. J. Public Health39, 12–16 (2011). An overview of Danish health and socio-economic registries and research possibilities as an example of extensive population-wide registration. ArticlePubMed Google Scholar
Øyen, N. et al. Recurrence of congenital heart defects in families. Circulation120, 295–301 (2009). ArticlePubMed Google Scholar
Masutani, Y., MacMahon, H. & Doi, K. Computerized detection of pulmonary embolism in spiral CT angiography based on volumetric image analysis. IEEE Trans. Med. Imaging.21, 1517–1523 (2002). ArticlePubMed Google Scholar
Hoffman, M. The genome-enabled electronic medical record. J. Biomed. Inform.40, 44–46 (2007). ArticleCASPubMed Google Scholar
Sax, U. & Schmidt, S. Integration of genomic data in Electronic Health Records — opportunities and dilemmas. Methods Inform. Med.44, 546–550 (2005). ArticleCAS Google Scholar
Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C. & Hurdle, J. F. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb. Med. Inform.2008, 128–144 (2008). An introduction to NLP and information extraction in the challenging clinical context, which also reviews the relevant research in the field. Article Google Scholar
Rosenbloom, S. T. et al. Data from clinical notes: a perspective on the tension between structure and flexible documentation. J. Am. Med. Inform. Assoc.8, 181–186 (2011). A summary of the conflicting views on structured and narrative health data in the context of how to produce valuable and reusable data. Article Google Scholar
The International Health Terminology Standards Development Organisation. Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT). [online]
Bodenreider, O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res.32, D267–D270 (2004). ArticleCASPubMedPubMed Central Google Scholar
Savova, G. K. et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc.17, 507–513 (2010). ArticlePubMedPubMed Central Google Scholar
Zeng, Q. T. et al. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med. Inform. Decis. Mak.6, 30 (2006). ArticlePubMedPubMed Central Google Scholar
Friedman, C., Alderson, P. O., Austin, J. H. M., Cimino, J. J. & Johnson, S. B. A general natural-language text processor for clinical radiology. J. Am. Med. Inform. Assoc.1, 161–174 (1994). ArticleCASPubMedPubMed Central Google Scholar
Friedman, C., Shagina, L., Lussier, Y. & Hripcsak, G. Automated encoding of clinical documents based on natural language processing. J. Am. Med. Inform. Assoc.11, 392–402 (2004). ArticlePubMedPubMed Central Google Scholar
Ohno-Machado, L. Realizing the full potential of electronic health records: the role of natural language processing. J. Am. Med. Inform. Assoc.18, 539 (2011). ArticlePubMedPubMed Central Google Scholar
Evans, R. S. et al. A computer-assisted management program for antibiotics and other antiinfective agents. N. Eng. J. Med.338, 232–238 (1998). ArticleCAS Google Scholar
Demner-Fushman, D., Chapman, W. W. & McDonald, C. J. What can natural language processing do for clinical decision support? J. Biomed. Inform.42, 760–772 (2009). ArticlePubMedPubMed Central Google Scholar
Bellazzi, R. & Zupan, B. Predictive data mining in clinical medicine: current issues and guidelines. Int. J. Med. Inform.77, 81–97 (2008). A review of the use of predictive methods in medicine with a special focus on temporal data. ArticlePubMed Google Scholar
Bellazzi, R., Ferrazzi, F. & Sacchi, L. Predictive data mining in clinical medicine: a focus on selected methods and applications. WIREs Data Mining Knowl. Discov.1, 416–430 (2011). Article Google Scholar
Lavrac, N. Selected techniques for data mining in medicine. Artif. Intell. Med.16, 3–23 (1999). ArticleCASPubMed Google Scholar
Degroot, V., Beckerman, H., Lankhorst, G. & Bouter, L. How to measure comorbidity. A critical review of available methods. J. Clin. Epidemiol.56, 221–229 (2003). Article Google Scholar
Hanauer, D., Rhodes, D. R. & Chinnaiyan, A. M. Exploring clinical associations using “-omics” based enrichment analyses. PLoS ONE4, e5203 (2009). ArticleCASPubMedPubMed Central Google Scholar
Roque, F. S. et al. Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Comput. Biol.7, e1002141 (2011). Patient stratification and discovery of disease comorbidities and their causes at the molecular level using structured data and text mining on a psychiatric cohort. ArticleCASPubMedPubMed Central Google Scholar
Holmes, A. B. et al. Discovering disease associations by integrating electronic clinical data and medical literature. PLoS ONE6, e21132 (2011). ArticleCASPubMedPubMed Central Google Scholar
Hidalgo, C., Blumm, N., Barabási, A.-L. & Christakis, N. A dynamic network approach for the study of human phenotypes. PLoS Comput. Biol.5, e1000353 (2009). ArticleCASPubMedPubMed Central Google Scholar
Gibbons, R. D. et al. Post-approval drug safety surveillance. Annu. Rev. Public Health2010, 419–437 (2010). Article Google Scholar
Lopez-Gonzalez, E., Herdeiro, M. T. & Figueiras, A. Determinants of under-reporting of adverse drug reactions: a systematic review. Drug Saf.32, 19–31 (2009). ArticleCASPubMed Google Scholar
Wang, X., Hripcsak, G., Markatou, M. & Friedman, C. Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. J. Am. Med. Inform. Assoc.16, 328–337 (2009). An example of how text mining of bulk EHR data can be used to uncover statistical correlations between clinical concepts, specifically between medications and ADEs. ArticlePubMedPubMed Central Google Scholar
Gini, R., Herings, R., Coloma, P. M., Schuemie, M. J. & Trifiro, G. Combining electronic healthcare databases in Europe to allow for large-scale drug safety monitoring : the EU-ADR Project. Pharmacoepidemiol. Drug Saf.20, 1–11 (2011). ArticlePubMed Google Scholar
Yao, L., Zhang, Y., Li, Y., Sanseau, P. & Agarwal, P. Electronic health records: implications for drug discovery. Drug Discov. Today16, 594–599 (2011). ArticleCASPubMed Google Scholar
Mullins, I. M. et al. Data mining and clinical data repositories: Insights from a 667,000 patient data set. Comput. Biol. Med.36, 1351–1377 (2006). ArticlePubMed Google Scholar
Wright, A., Chen, E. S. & Maloney, F. L. An automated technique for identifying associations between medications, laboratory results and problems. J. Biomed. Inform.43, 891–901 (2010). ArticlePubMed Google Scholar
Harpaz, R., Chase, H. S. & Friedman, C. Mining multi-item drug adverse effect associations in spontaneous reporting systems. BMC Bioinformatics11 (Suppl. 9), S7 (2010). ArticlePubMedPubMed Central Google Scholar
Swanson, D. R. Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect. Biol. Med.30, 7–18 (1986). ArticleCASPubMed Google Scholar
Tsuruoka, Y., Miwa, M., Hamamoto, K., Tsujii, J. & Ananiadou, S. Discovering and visualizing indirect associations between biomedical concepts. Bioinformatics27, 111–119 (2011). ArticleCAS Google Scholar
Oztekin, A., Delen, D. & Kong, Z. J. Predicting the graft survival for heart-lung transplantation patients: an integrated data mining methodology. Int. J. Med. Inform.78, e84–e96 (2009). ArticlePubMed Google Scholar
Delen, D., Walker, G. & Kadam, A. Predicting breast cancer survivability: a comparison of three data mining methods. Artif. Intell. Med.34, 113–127 (2005). ArticlePubMed Google Scholar
Kurt, I., Ture, M. & Kurum, A. T. Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst. Appl.34, 366–374 (2008). Article Google Scholar
Valentino-Devries, J. May the best algorithm win. The Wall Street Journal[online], (2011). Google Scholar
Ohlsson, M., Peterson, C. & Dictor, M. Using hidden Markov models to characterize disease trajectories. Proc. Neural Networks and Expert Systems in Medicine and Healthcare Conference2001, 324–326 (2001). Google Scholar
Chen, L. L., Blumm, N., Christakis, N. A., Barabási, A.-L. & Deisboeck, T. S. Cancer metastasis networks and the prediction of progression patterns. Br. J. Cancer101, 749–758 (2009). ArticleCASPubMedPubMed Central Google Scholar
Fu, T.-C. A review on time series data mining. Eng. Appl. Artif. Intell.24, 164–181 (2011). Article Google Scholar
Cao, H., Melton, G. B., Markatou, M. & Hripcsak, G. Use abstracted patient-specific features to assist an information-theoretic measurement to assess similarity between medical cases. J. Biomed. Inform.41, 882–888 (2008). ArticlePubMedPubMed Central Google Scholar
Melton, G. B. et al. Inter-patient distance metrics using SNOMED CT defining relationships. J. Biomed. Inform.39, 697–705 (2006). ArticlePubMed Google Scholar
Murphy, S. et al. Instrumenting the health care enterprise for discovery research in the genomic era. Genome Res.19, 1675–1681 (2009). ArticleCASPubMedPubMed Central Google Scholar
Murphy, S. N. et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J. Am. Med. Inform. Assoc.17, 124–130 (2010). A thorough description of the architecture and capabilities of the i2b2 research platform for biomedical research based on EHR data. ArticlePubMedPubMed Central Google Scholar
McCarty, C. A. et al. The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med. Genomics4, 13 (2011). ArticlePubMedPubMed Central Google Scholar
Kho, A. N. et al. Electronic medical records for genetic research: results of the eMERGE consortium. Science Transl. Med.3, 79re1 (2011). Article Google Scholar
Schildcrout, J. S. et al. An analytical approach to characterize morbidity profile dissimilarity between distinct cohorts using electronic medical records. J. Biomed. Inform.43, 914–923 (2010). ArticlePubMedPubMed Central Google Scholar
Kurreeman, F. et al. Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records. Am. J. Hum. Genet.88, 57–69 (2011). The i2b2 platform put to use for case–control generation and study design based on EHR and DNA data in a rheumatoid arthritis project. ArticleCASPubMedPubMed Central Google Scholar
Kullo, I. J. et al. Complement receptor 1 gene variants are associated with erythrocyte sedimentation rate. Am. J. Hum. Genet.89, 131–138 (2011). ArticleCASPubMedPubMed Central Google Scholar
Kullo, I. J., Ding, K., Jouni, H., Smith, C. Y. & Chute, C. G. A genome-wide association study of red blood cell traits using the electronic medical record. PLoS ONE5, 9 (2010). ArticleCAS Google Scholar
Denny, J. C. et al. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am. J. Hum. Genet.89, 529–542 (2011). ArticleCASPubMedPubMed Central Google Scholar
Ritchie, M. D. et al. Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. Am. J. Hum. Genet.86, 560–572 (2010). ArticleCASPubMedPubMed Central Google Scholar
Perlis, R. H. et al. Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model. Psychol. Med.42, 41–50 (2012). ArticleCASPubMed Google Scholar
Kho, A. N. et al. Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J. Am. Med. Inform. Assoc.19, 212–218 (2011). ArticlePubMedPubMed Central Google Scholar
Himes, B. E., Dai, Y., Kohane, I. S., Weiss, S. T. & Ramoni, M. F. Prediction of chronic obstructive pulmonary disease (COPD) in asthma patients using electronic medical records. J. Am. Med. Inform. Assoc.16, 371–379 (2009). ArticlePubMedPubMed Central Google Scholar
Roden, D. M. et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin. Pharmacol. Ther.84, 362–369 (2008). A description of the technical, scientific and legal aspects of the development of an EHR–DNA linked research database with an opt-out consent model. ArticleCASPubMed Google Scholar
Denny, J. C. et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics26, 1205–1210 (2010). A demonstration of how EHR data linked with DNA data can be used in a reversal of the normal GWAS approach to search for disease phenotypes associated with SNPs. ArticleCASPubMedPubMed Central Google Scholar
Wilke, R. et al. The emerging role of electronic medical records in pharmacogenomics. Clin. Pharmacol. Ther.89, 379–386 (2011). ArticleCASPubMed Google Scholar
Al Mallah, A., Guelpa, P., Marsh, S. & van Rooij, T. Integrating genomic-based clinical decision support into electronic health records. Personalized Med.7, 163–170 (2010). Article Google Scholar
McCarty, C. A. & Wilke, R. A. Biobanking and pharmacogenomics. Pharmacogenomics11, 637–641 (2010). ArticleCASPubMed Google Scholar
Schwarz, U. I. et al. Genetic determinants of response to warfarin during initial anticoagulation. N. Eng. J. Med.358, 999–1008 (2008). ArticleCAS Google Scholar
Onitilo, A. et al. Estrogen receptor genotype is associated with risk of venous thromboembolism during tamoxifen therapy. Breast Cancer Res. Treat.115, 643–650 (2009). ArticleCASPubMed Google Scholar
Lage, K. et al. Dissecting spatio-temporal protein networks driving human heart development and related disorders. Mol. Syst. Biol.6, 1–9 (2010). Article Google Scholar
Greenblum, S., Turnbaugh, P. J. & Borenstein, E. Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease. Proc. Natl Acad. Sci. USA.109, 594–599 (2012). ArticleCASPubMed Google Scholar
Rzhetsky, A., Wajngurt, D., Park, N. & Zheng, T. Probing genetic overlap among complex human phenotypes. Proc. Natl Acad. Sci. USA.104, 11694–11699 (2007). ArticleCASPubMedPubMed Central Google Scholar
Park, J., Lee, D.-S., Christakis, N. A. & Barabási, A.-L. The impact of cellular networks on disease comorbidity. Mol. Syst. Biol.5, 262 (2009). ArticlePubMedPubMed Central Google Scholar
Hood, L., Heath, J. R., Phelps, M. E. & Lin, B. Systems biology and new technologies enable predictive and preventative medicine. Science306, 640–643 (2004). ArticleCASPubMed Google Scholar
Galas, D. J. & Hood, L. Systems biology and emerging technologies will catalyze the transition from reactive medicine to predictive, personalized, preventive and participatory (P4) medicine. Interdisciplinary Bio Central1, 6 (2009). Article Google Scholar
Hall, M. A. Property, privacy, and the pursuit of interconnected electronic medical records. Iowa Law Review2010, 631–663 (2010). Google Scholar
Noble, S. et al. Feasibility and cost of obtaining informed consent for essential review of medical records in large-scale health services research. J. Health Serv. Res. Policy14, 77–81 (2009). ArticlePubMed Google Scholar
Kho, M. E., Duffett, M., Willison, D. J., Cook, D. J. & Brouwers, M. C. Written informed consent and selection bias in observational studies using medical records: systematic review. BMJ338, 1–8 (2009). Article Google Scholar
Hoffman, S. Balancing privacy, autonomy, and scientific needs in electronic health records research. Case Research Paper Series in Legal Studies[online], (2011). An extensive summary of legal and ethical issues encountered in health research and their potential consequences for conducting scientific research. Google Scholar
Meystre, S. M., Friedlin, F. J., South, B. R., Shen, S. & Samore, M. H. Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC Med. Res. Methodol.10, 1–16 (2010). Article Google Scholar
Benitez, K. & Malin, B. Evaluating re-identification risks with respect to the HIPAA privacy rule. J. Am. Med. Inform. Assoc.17, 169–177 (2010). ArticlePubMedPubMed Central Google Scholar
Heeney, C., Hawkins, N., de Vries, J., Boddington, P. & Kaye, J. Assessing the privacy risks of data sharing in genomics. Public Health Genomics14, 17–25 (2011). ArticleCASPubMed Google Scholar
Homer, N. et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet.4, e1000167 (2008). ArticleCASPubMedPubMed Central Google Scholar
Malin, B. & Sweeney, L. Re-identification of DNA through an automated linkage process. Proc. AMIA Symp.2001, 423–427 (2001). Google Scholar
Begoyan, A. An overview of interoperability standards for electronic health records. In Integrated Design and Process Technology (IDPT-2007) (2007). Google Scholar
Goossen, W., Goossen-Baremans, A. & van der Zel, M. Detailed clinical models: a review. Healthc. Inform. Res.16, 201–214 (2010). An introduction to modelling and representation of clinical concepts and meaning, which is important for data interoperability. ArticlePubMedPubMed Central Google Scholar
Knaup, P., Bott, O., Kohl, C., Lovis, C. & Garde, S. Electronic patient records: moving from islands and bridges towards electronic health records for continuity of care. Yearb. Med. Inform.2007, 34–46 (2007). Article Google Scholar
Garde, S., Knaup, P., Hovenga, E. & Heard, S. Towards semantic interoperability for electronic health records. Methods Inf. Med.46, 332–343 (2007). ArticlePubMed Google Scholar
Wicks, P., Vaughan, T. E., Massagli, M. P. & Heywood, J. Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm. Nature Biotech.29, 411–414 (2011). ArticleCAS Google Scholar
Aronson, R. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc. AMIA Symp.2001, 17–21 (2001). Google Scholar
Uzuner, O., Goldstein, I., Luo, Y. & Kohane, I. Identifying patient smoking status from medical discharge records. J. Am. Med. Inform. Assoc.15, 14–24 (2008). ArticlePubMedPubMed Central Google Scholar
Uzuner, O., Solti, I. & Cadag, E. Extracting medication information from clinical text. J. Am. Med. Inform. Assoc.17, 514–518 (2010). ArticlePubMedPubMed Central Google Scholar
Uzuner, O., South, B. R., Shen, S. & Duvall, S. L. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc.18, 552–557 (2011). ArticlePubMedPubMed Central Google Scholar
Fung, K. W., McDonald, C. & Bray, B. E. RxTerms - a drug interface terminology derived from RxNorm. Proc. AMIA Symp.2008, 227–231 (2008). PubMed Central Google Scholar
Steindel, S. J. International classification of diseases, 10th edition, clinical modification and procedure coding system: descriptive overview of the next generation HIPAA code sets. J. Am. Med. Inform. Assoc.17, 274–282 (2010). ArticlePubMedPubMed Central Google Scholar