Do no harm: a roadmap for responsible machine learning for health care (original) (raw)

Change history

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

References

  1. Lazer, D., Kennedy, R., King, G. & Vespignani, A. Big data. The parable of Google Flu: traps in big data analysis. Science 343, 1203–1205 (2014).
    Article CAS Google Scholar
  2. Hutson, M. Even artificial intelligence can acquire biases against race and gender. Science https://doi.org/10.1126/science.aal1053 (2017).
  3. He, J. et al. The practical implementation of artificial intelligence technologies in medicine. Nat. Med. 25, 30–36 (2019).
    Article CAS Google Scholar
  4. Silva, I., Moody, G., Scott, D. J., Celi, L. A. & Mark, R. G. Predicting in-hospital mortality of ICU patients: the Physionet/Computing in Cardiology Challenge 2012. Comput. Cardiol. 39, 245–248 (2012).
    Google Scholar
  5. Luo, Y., Cai, X., Zhang, Y. & Xu, J. Multivariate time series imputation with generative adversarial networks. in Advances in Neural Information Processing Systems 1596–1607 (NeurIPS, 2018).
  6. O’Malley, K. J. et al. Measuring diagnoses: ICD code accuracy. Health Serv. Res. 40, 1620–1639 (2005).
    Article Google Scholar
  7. Saria, S. & Subbaswamy, A. Tutorial: safe and reliable machine learning. Preprint at https://arxiv.org/abs/1904.07204 (2019).
  8. Chen, I. Y., Szolovits, P. & Ghassemi, M. Can AI help reduce disparities in general medical and mental health care? AMA J. Ethics 21, E167–E179 (2019).
    Article Google Scholar
  9. Schulam, P. & Saria, S. Reliable decision support using counterfactual models. in Advances in Neural Information Processing Systems 1697–1708 (NeurIPS, 2017).
  10. O’neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (Broadway Books, 2016).
  11. Williams, D. R., Mohammed, S. A., Leavell, J. & Collins, C. Race, socioeconomic status, and health: complexities, ongoing challenges, and research opportunities. Ann. NY Acad. Sci. 1186, 69–101 (2010).
    Article Google Scholar
  12. Rajpurkar, P. et al. Chexnet: radiologist-level pneumonia detection on chest X-rays with deep learning. Preprint at https://arxiv.org/abs/1711.05225 (2017).
  13. Liu, V.X., Bates, D.W., Wiens, J. & Shah, N.H. The number needed to benefit: estimating the value of predictive analytics in healthcare. J. Am. Med. Inform. Assoc. https://doi.org/10.1093/jamia/ocz088 (2019).
  14. Oh, J. et al. A generalizable, data-driven approach to predict daily risk of C_lostridium difficile_ infection at two large academic health centers. Infect. Control Hosp. Epidemiol. 39, 425–433 (2018).
    Article Google Scholar
  15. Schulam, P. & Saria, S. Can you trust this prediction? Auditing pointwise reliability after learning. in The 22nd International Conference on Artificial Intelligence and Statistics 1022–1031 (PMLR, 2019).
  16. Henderson, P. et al. Deep reinforcement learning that matters. in Thirty-second AAAI Conference on Artificial Intelligence (AAAI, 2018).
  17. Nestor, B. et al. Rethinking clinical prediction: why machine learning must consider year of care and feature aggregation. Preprint at https://arxiv.org/abs/1811.12583 (2018).
  18. Henry, K. E., Hager, D. N., Pronovost, P. J. & Saria, S. A targeted real-time early warning score (TREWScore) for septic shock. Sci. Transl. Med. 7, 299ra122 (2015).
    Article Google Scholar
  19. Hemming, K., Haines, T. P., Chilton, P. J., Girling, A. J. & Lilford, R. J. The stepped wedge cluster randomised trial: rationale, design, analysis, and reporting. Br. Med. J. 350, h391 (2015).
    Article CAS Google Scholar
  20. Evans, B. & Ossorio, P. The challenge of regulating clinical decision support software after 21st century cures. Am. J. Law Med. 44, 237–251 (2018).
    Article Google Scholar
  21. Okoro, A. O. Preface: The 21st Century Cures Act—a cure for the 21st century? Am. J. Law Med. 44, 155 (2018).
    Article Google Scholar
  22. Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) (U.S. Food & Drug Administration, 2019); https://www.fda.gov/media/122535/download
  23. Massachusetts Institute of Technology. Self-driving cars, robots: identifying AI ‘blind spots’. ScienceDaily (25 January 2019).
  24. Chien, S. & Wagstaff, K. L. Robotic space exploration agents. Sci. Robot. 2, eaan4831 (2017).
    Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the participants in the MLHC Conference 2018 (http://www.mlforhc.org), specifically the organizers and participants of the pre-meeting workshop that served as the genesis for this manuscript, for providing valuable feedback on the initial ideas through a panel discussion.

Author information

Author notes

  1. These authors contributed equally: Jenna Wiens, Suchi Saria, Anna Goldenberg.

Authors and Affiliations

  1. Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA
    Jenna Wiens
  2. Departments of Computer Science and Statistics, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
    Suchi Saria
  3. Department of Health Policy and Management, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
    Suchi Saria
  4. Bayesian Health, New York, NY, USA
    Suchi Saria
  5. Duke Institute for Health Innovation, Duke University School of Medicine, Durham, NC, USA
    Mark Sendak
  6. Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
    Marzyeh Ghassemi & Anna Goldenberg
  7. Department of Medicine, University of Toronto, Toronto, Ontario, Canada
    Marzyeh Ghassemi
  8. Vector Institute, Toronto, Ontario, Canada
    Marzyeh Ghassemi & Anna Goldenberg
  9. Kaiser Permanente Division of Research, Oakland, CA, USA
    Vincent X. Liu
  10. School of Engineering and Applied Science, Harvard University, Cambridge, MA, USA
    Finale Doshi-Velez
  11. Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
    Kenneth Jung
  12. Google Inc., Mountain View, CA, USA
    Katherine Heller
  13. Department of Statistical Science, Duke University, Durham, NC, USA
    Katherine Heller
  14. Information Sciences Institute, University of Southern California, Los Angeles, CA, USA
    David Kale
  15. Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
    Mohammed Saeed
  16. Law School, University of Wisconsin–Madison, Madison, WI, USA
    Pilar N. Ossorio
  17. Presence and Program in Bedside Medicine, Stanford University School of Medicine, Stanford, CA, USA
    Sonoo Thadaney-Israni
  18. SickKids Research Institute, Toronto, Ontario, Canada
    Anna Goldenberg
  19. Child and Brain Development Program, CIFAR, Toronto, Ontario, Canada
    Anna Goldenberg

Authors

  1. Jenna Wiens
    You can also search for this author inPubMed Google Scholar
  2. Suchi Saria
    You can also search for this author inPubMed Google Scholar
  3. Mark Sendak
    You can also search for this author inPubMed Google Scholar
  4. Marzyeh Ghassemi
    You can also search for this author inPubMed Google Scholar
  5. Vincent X. Liu
    You can also search for this author inPubMed Google Scholar
  6. Finale Doshi-Velez
    You can also search for this author inPubMed Google Scholar
  7. Kenneth Jung
    You can also search for this author inPubMed Google Scholar
  8. Katherine Heller
    You can also search for this author inPubMed Google Scholar
  9. David Kale
    You can also search for this author inPubMed Google Scholar
  10. Mohammed Saeed
    You can also search for this author inPubMed Google Scholar
  11. Pilar N. Ossorio
    You can also search for this author inPubMed Google Scholar
  12. Sonoo Thadaney-Israni
    You can also search for this author inPubMed Google Scholar
  13. Anna Goldenberg
    You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence toJenna Wiens or Anna Goldenberg.

Ethics declarations

Competing interests

J.W., F.D.-V., D.K. and K.J. are on the board of Machine Learning for Healthcare, a non-profit organization that hosts a yearly academic meeting; they are reimbursed for registration and travel expenses. F.D.-V. consults for DaVita, a healthcare company. S.T.-I. serves on the board of Scients (https://scients.org/) and is reimbursed for travel expenses. S.S. is a founder of, and holds equity in, Bayesian Health. The results of the study discussed in this publication could affect the value of Bayesian Health. This arrangement has been reviewed and approved by Johns Hopkins University in accordance with its conflict-of-interest policies. S.S. is a member of the scientific advisory board for PatientPing. M. Sendak is a named inventor of the Sepsis Watch deep-learning model, which was licensed from Duke University by Cohere Med, Inc. M. Sendak does not hold any equity in Cohere Med, Inc. M. Saeed is a founder and Chief Medical Officer at HEALTH at SCALE Technologies and holds equity in this company. P.O. consults for Roche-Genentech, from whom she has received travel reimbursement and consulting fees of less than $4,000/year. A.G., K.H., M.G. and V.L. have no conflicts to declare.

Additional information

Peer review information Joao Monteiro was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

About this article

Cite this article

Wiens, J., Saria, S., Sendak, M. et al. Do no harm: a roadmap for responsible machine learning for health care.Nat Med 25, 1337–1340 (2019). https://doi.org/10.1038/s41591-019-0548-6

Download citation