A pathology foundation model for cancer diagnosis and prognosis prediction (original) (raw)

Data availability

This work utilized 16 pathology datasets from large research consortia, including TCGA (https://portal.gdc.cancer.gov), GTEx (https://www.gtexportal.org/home/), PAIP (http://www.wisepaip.org/paip), PANDA (https://www.kaggle.com/c/prostate-cancer-grade-assessment), BCC (https://datahub.aida.scilifelab.se/10.23698/aida/bccc), ACROBAT (https://doi.org/10.48723/w728-p041), BCNB (https://bcnb.grand-challenge.org/), TOC (https://www.cancerimagingarchive.net/collection/ovarian-bevacizumab-response/), CPTAC (https://portal.gdc.cancer.gov), DROID-Breast (https://datahub.aida.scilifelab.se/10.23698/aida/drbr), Dataset-PT (https://github.com/CSU-BME/pathology_SSL), Diagset-B (https://github.com/michalkoziarski/DiagSet), MUV (https://doi.org/10.25493/WQ48-ZGX) and PLCO (https://cdas.cancer.gov/plco/). The other two datasets, PAIP2020 and TissueNet, can be requested from the respective data science challenge organizers: PAIP2020 (https://paip2020.grand-challenge.org/) and TissueNet (https://www.drivendata.org/competitions/67/competition-cervical-biopsy/). Supplementary Table 22 provides the links to the raw data from these sources. We obtained institutional data for CHIEF pretraining and validation from DFCI, BWH, YH, SMCH, CUCH and the Hospital of the University of Pennsylvania. These data are not publicly available owing to patient privacy obligations and institutional review board and data use agreement requirements. Researchers may obtain de-identified data directly from DFCI, BWH, YH, SMCH, CUCH and the Hospital of the University of Pennsylvania by reasonable request and subject to institutional ethical approvals. Data access enquiries should be directed to K.-H.Y. We aim to forward all requests to the managers of these institutional datasets in 2 weeks, and these requests will be evaluated according to their institutional policies. Data are strictly only for non-commercial academic use. This study relies on retrospective analysis of anonymized pathology slides. Source data are provided with this paper.

Code availability

All code was implemented in Python using PyTorch as the primary deep learning package. The source codes for CHIEF are available at https://github.com/hms-dbmi/CHIEF. Our Docker images are available at https://hub.docker.com/r/chiefcontainer/chief.

References

  1. Van der Laak, J., Litjens, G. & Ciompi, F. Deep learning in histopathology: the path to the clinic. Nat. Med. 27, 775–784 (2021).
    Article PubMed Google Scholar
  2. Shmatko, A., Ghaffari Laleh, N., Gerstung, M. & Kather, J. N. Artificial intelligence in histopathology: enhancing cancer research and clinical oncology. Nat. Cancer 3, 1026–1038 (2022).
    Article PubMed Google Scholar
  3. Song, A. H. et al. Artificial intelligence for digital and computational pathology. Nat. Rev. Bioeng. 1, 930–949 (2023).
    Article Google Scholar
  4. Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).
    Article CAS PubMed PubMed Central Google Scholar
  5. Bejnordi, B. E. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).
    Article Google Scholar
  6. Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
    Article PubMed PubMed Central Google Scholar
  7. Coudray, N. et al. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).
    Article CAS PubMed PubMed Central Google Scholar
  8. Nasrallah, M. P. et al. Machine learning for cryosection pathology predicts the 2021 WHO classification of glioma. Med 4, 526–540 (2023).
    Article PubMed Google Scholar
  9. Tsai, P.-C. et al. Histopathology images predict multi-omics aberrations and prognoses in colorectal cancer patients. Nat. Commun. 14, 2102 (2023).
    Article ADS CAS PubMed PubMed Central Google Scholar
  10. Yu, K.-H. et al. Classifying non-small cell lung cancer types and transcriptomic subtypes using convolutional neural networks. J. Am. Med. Inform. Assoc. 27, 757–769 (2020).
    Article PubMed PubMed Central Google Scholar
  11. Yu, K.-H. et al. Association of omics features with histopathology patterns in lung adenocarcinoma. Cell Syst. 5, 620–627 (2017).
    Article CAS PubMed PubMed Central Google Scholar
  12. Chen, R. J. et al. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 40, 865–878 (2022).
    Article CAS PubMed PubMed Central Google Scholar
  13. Marostica, E. et al. Development of a histopathology informatics pipeline for classification and prediction of clinical outcomes in subtypes of renal cell carcinoma. Clin. Cancer Res. 27, 2868–2878 (2021).
    Article CAS PubMed Google Scholar
  14. Yu, K.-H. et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat. Commun. 7, 12474 (2016).
    Article ADS CAS PubMed PubMed Central Google Scholar
  15. Vanguri, R. S. et al. Multimodal integration of radiology, pathology and genomics for prediction of response to PD-(L)1 blockade in patients with non-small cell lung cancer. Nat. Cancer 3, 1151–1164 (2022).
    Article CAS PubMed PubMed Central Google Scholar
  16. Yu, K.-H. et al. Deciphering serous ovarian carcinoma histopathology and platinum response by convolutional neural networks. BMC Med. 18, 236 (2020).
    Article CAS PubMed PubMed Central Google Scholar
  17. Foersch, S. et al. Multistain deep learning for prediction of prognosis and therapy response in colorectal cancer. Nat. Med. 29, 430–439 (2023).
    Article CAS PubMed Google Scholar
  18. Kather, J. N. et al. Pan-cancer image-based detection of clinically actionable genetic alterations. Nat. Cancer 1, 789–799 (2020).
    Article CAS PubMed PubMed Central Google Scholar
  19. Echle, A. et al. Deep learning in cancer pathology: a new generation of clinical biomarkers. Br. J. Cancer 124, 686–696 (2021).
    Article PubMed Google Scholar
  20. Ektefaie, Y. et al. Integrative multiomics-histopathology analysis for breast cancer classification. NPJ Breast Cancer 7, 147 (2021).
    Article CAS PubMed PubMed Central Google Scholar
  21. Yu, K.-H., Beam, A. L. & Kohane, I. S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2, 719–731 (2018).
    Article PubMed Google Scholar
  22. Krishnan, R., Rajpurkar, P. & Topol, E. J. Self-supervised learning in medicine and healthcare. Nat. Biomed. Eng. 6, 1346–1352 (2022).
    Article PubMed Google Scholar
  23. Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–1638 (2023).
  24. Chen, C. et al. Fast and scalable search of whole-slide images via self-supervised deep learning. Nat. Biomed. Eng. 6, 1420–1434 (2022).
    Article PubMed PubMed Central Google Scholar
  25. Wang, X. et al. RetCCL: clustering-guided contrastive learning for whole-slide image retrieval. Med. Image Anal. 83, 102645 (2023).
    Article PubMed Google Scholar
  26. Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).
  27. Wagner, S. J. et al. Transformer-based biomarker prediction from colorectal cancer histology: a large-scale multicentric study. Cancer Cell 41, 1650–1661 (2023).
    Article CAS PubMed PubMed Central Google Scholar
  28. Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical twitter. Nat. Med. 29, 2307–2316 (2023).
    Article CAS PubMed Google Scholar
  29. Lu, M. Y. et al. A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024).
  30. Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022).
    Article PubMed Google Scholar
  31. Koziarski, M. et al. Diagset: a dataset for prostate cancer histopathological image classification. Sci. Rep. 14, 6780 (2024).
    Article ADS CAS PubMed PubMed Central Google Scholar
  32. Yu, G. et al. Accurate recognition of colorectal cancer with semi-supervised deep learning on pathological images. Nat. Commun. 12, 6311 (2021).
    Article ADS CAS PubMed PubMed Central Google Scholar
  33. Loménie, N. et al. Can AI predict epithelial lesion categories via automated analysis of cervical biopsies: the TissueNet challenge? J. Pathol. Inform. 13, 100149 (2022).
    Article PubMed PubMed Central Google Scholar
  34. Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2127–2136 (PMLR, 2018).
  35. Li, B., Li, Y. & Eliceiri, K. W. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In Proc. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition 14313–14323 (IEEE, 2021).
  36. Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat. Cancer 1, 800–810 (2020).
    Article CAS PubMed Google Scholar
  37. Petrini, I. et al. A specific missense mutation in GTF2I occurs at high frequency in thymic epithelial tumors. Nat. Genet. 46, 844–849 (2014).
    Article CAS PubMed PubMed Central Google Scholar
  38. Carbone, M. et al. Biological mechanisms and clinical significance of BAP1 mutations in human cancer. Cancer Discov. 10, 1103–1120 (2020).
    Article CAS PubMed PubMed Central Google Scholar
  39. Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precision Oncology 1, 1–16 (2017).
    Article Google Scholar
  40. Louis, D. N. et al. The 2021 WHO Classification of Tumors of the Central Nervous System: a summary. Neuro-Oncology 23, 1231–1251 (2021).
    Article CAS PubMed PubMed Central Google Scholar
  41. Roetzer-Pejrimovsky, T. et al. The Digital Brain Tumour Atlas, an open histopathology resource. Sci. Data 9, 55 (2022).
    Article PubMed PubMed Central Google Scholar
  42. Kim, K. et al. PAIP 2020: microsatellite instability prediction in colorectal cancer. Med. Image Anal. 89, 102886 (2023).
    Article PubMed Google Scholar
  43. Amin, M. B. et al. The Eighth Edition AJCC Cancer Staging Manual: continuing to build a bridge from a population-based to a more “personalized” approach to cancer staging. CA Cancer J. Clin. 67, 93–99 (2017).
    Article PubMed Google Scholar
  44. Achiam, J. et al. GPT-4 technical report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2023).
  45. Team, G. et al. Gemini: a family of highly capable multimodal models. Preprint at https://doi.org/10.48550/arXiv.2312.11805 (2023).
  46. Azizi, S. et al. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat. Biomed. Eng. 7, 756–779 (2023).
  47. Cancer Genome Atlas Research Network, J. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
    Article Google Scholar
  48. Lonsdale, J. et al. The genotype-tissue expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
    Article CAS Google Scholar
  49. Bulten, W. et al. Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge. Nat. Med. 28, 154–163 (2022).
  50. Yacob, F. et al. Weakly supervised detection and classification of basal cell carcinoma using graph-transformer on whole slide images. Sci Rep. 13, 7555 (2023).
    Article ADS CAS PubMed PubMed Central Google Scholar
  51. Xu, F. et al. Predicting axillary lymph node metastasis in early breast cancer using deep learning on primary tumor biopsy slides. Front. Oncol. 11, 4133 (2021).
    Article Google Scholar
  52. Weitz, P. et al. A multi-stain breast cancer histological whole-slide-image data set from routine diagnostics. Sci. Data 10, 562 (2023).
    Article CAS PubMed PubMed Central Google Scholar
  53. Wang, C.-W. et al. Histopathological whole slide image dataset for classification of treatment effectiveness to ovarian cancer. Sci. Data 9, 25 (2022).
    Article PubMed PubMed Central Google Scholar
  54. Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 8748–8763 (PMLR, 2021).
  55. Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. Syst. 9, 62–66 (1979).
    Article Google Scholar
  56. Kingma, D. P. & Ba, J. L. Adam: a method for stochastic optimization. In Proc. 3rd International Conference on Learning Representations (eds Bengio, Y. & LeCun, Y.) (ICLR, 2015).
  57. Loshchilov, I. & Hutter, F. SGDR: stochastic gradient descent with warm restarts. In Proc. 5th International Conference on Learning Representations 1769–1784 (ICLR, 2017).
  58. Stadler, C. B. et al. Proactive construction of an annotated imaging database for artificial intelligence training. J. Digit. Imaging 34, 105–115 (2021).
    Article PubMed Google Scholar
  59. Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).
    Article ADS CAS PubMed Google Scholar
  60. Black, A. et al. PLCO: evolution of an epidemiologic resource and opportunities for future studies. Rev. Recent Clin. Trials 10, 238–245 (2015).
    Article PubMed PubMed Central Google Scholar
  61. Shao, Z. et al. TransMIL: Transformer based correlated multiple instance learning for whole slide image classification. Adv. Neural Inf. Process. Syst. 34, 2136–2147 (2021).
    Google Scholar
  62. Liang, J. et al. Deep learning supported discovery of biomarkers for clinical prognosis of liver cancer. Nat. Mach. Intell. 5, 408–420 (2023).
    Article Google Scholar
  63. Courtiol, P. et al. Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nat. Med. 25, 1519–1525 (2019).
    Article CAS PubMed Google Scholar
  64. Graham, S. et al. Hover-Net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 58, 101563 (2019).
    Article PubMed Google Scholar

Download references

Acknowledgements

We thank C. Burroughs, M. Kapanadze and F. McDonald for administrative support; and the AWS Cloud Credits for Research programme, the Microsoft Azure for Research Award programme, the NVIDIA GPU Grant Program and the Extreme Science and Engineering Discovery Environment at the Pittsburgh Supercomputing Center (allocation TGBCS180016) for computational support. K.-H.Y. is in part supported by the National Institute of General Medical Sciences grant R35GM142879, the Department of Defense Peer Reviewed Cancer Research Program Career Development Award HT9425-231-0523, the Research Scholar Grant RSG-24-1253761-01-ESED (grant DOI: https://doi.org/10.53354/ACS.RSG-24-1253761-01-ESED.pc.gr.193749) from the American Cancer Society, a Google Research Scholar Award, the Harvard Medical School Dean’s Innovation Award and the Blavatnik Center for Computational Biomedicine Award. K.L.L. is in part supported by the National Institutes of Health award P50CA165962 and the 3000 Miles to the Cure Foundation. The PAIP data were provided by the Seoul National University Hospital and funded by the Ministry of Health and Welfare, Republic of Korea (grant number HI18C0316).

Author information

Author notes

  1. These authors contributed equally: Xiyue Wang, Junhan Zhao

Authors and Affiliations

  1. Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
    Xiyue Wang, Junhan Zhao, Eliana Marostica, Christopher R. Jackson, Sen Yang & Kun-Hsing Yu
  2. Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA
    Xiyue Wang, Ruijiang Li & Sen Yang
  3. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
    Junhan Zhao
  4. Division of Health Sciences and Technology, Harvard-Massachusetts Institute of Technology, Boston, MA, USA
    Eliana Marostica
  5. College of Biomedical Engineering, Sichuan University, Chengdu, China
    Wei Yuan, Jiayu Zhang & Jing Zhang
  6. Department of Pathology, Sun Yat-sen University Cancer Center, Guangzhou, China
    Jietian Jin
  7. Department of Pathology, Shenzhen Maternity & Child Healthcare Hospital, Shenzhen, China
    Hongping Tang
  8. Department of Radiation Oncology, Chongqing University Cancer Hospital, Chongqing, China
    Kanran Wang
  9. Department of Pathology, Chongqing University Cancer Hospital, Chongqing, China
    Yu Li
  10. Department of Pathology, The Affiliated Yantai Yuhuangding Hospital of Qingdao University, Yantai, China
    Fang Wang
  11. Department of Pathology, The First Affiliated Hospital of Jinan University, Guangzhou, China
    Yulong Peng
  12. Department of Burn, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
    Junyou Zhu
  13. Department of Pathology and Laboratory Medicine, Pennsylvania State University, Hummelstown, PA, USA
    Christopher R. Jackson
  14. Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
    Christopher R. Jackson
  15. Tencent AI Lab, Shenzhen, China
    Jun Zhang & Xiao Han
  16. Department of Pathology, Brigham and Women’s Hospital, Boston, MA, USA
    Deborah Dillon, Lynette Sholl, Thomas Denize, David Meredith, Keith L. Ligon, Sabina Signoretti, Shuji Ogino, Jeffrey A. Golden & Kun-Hsing Yu
  17. Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
    Nancy U. Lin
  18. Department of Pathology, Dana-Farber Cancer Institute, Boston, MA, USA
    Lynette Sholl, Thomas Denize, Keith L. Ligon & Sabina Signoretti
  19. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
    Shuji Ogino
  20. Broad Institute of MIT and Harvard, Cambridge, MA, USA
    Shuji Ogino
  21. Department of Pathology, Cedars-Sinai Medical Center, Los Angeles, CA, USA
    Jeffrey A. Golden
  22. Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
    MacLean P. Nasrallah
  23. Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA
    Kun-Hsing Yu

Authors

  1. Xiyue Wang
    You can also search for this author inPubMed Google Scholar
  2. Junhan Zhao
    You can also search for this author inPubMed Google Scholar
  3. Eliana Marostica
    You can also search for this author inPubMed Google Scholar
  4. Wei Yuan
    You can also search for this author inPubMed Google Scholar
  5. Jietian Jin
    You can also search for this author inPubMed Google Scholar
  6. Jiayu Zhang
    You can also search for this author inPubMed Google Scholar
  7. Ruijiang Li
    You can also search for this author inPubMed Google Scholar
  8. Hongping Tang
    You can also search for this author inPubMed Google Scholar
  9. Kanran Wang
    You can also search for this author inPubMed Google Scholar
  10. Yu Li
    You can also search for this author inPubMed Google Scholar
  11. Fang Wang
    You can also search for this author inPubMed Google Scholar
  12. Yulong Peng
    You can also search for this author inPubMed Google Scholar
  13. Junyou Zhu
    You can also search for this author inPubMed Google Scholar
  14. Jing Zhang
    You can also search for this author inPubMed Google Scholar
  15. Christopher R. Jackson
    You can also search for this author inPubMed Google Scholar
  16. Jun Zhang
    You can also search for this author inPubMed Google Scholar
  17. Deborah Dillon
    You can also search for this author inPubMed Google Scholar
  18. Nancy U. Lin
    You can also search for this author inPubMed Google Scholar
  19. Lynette Sholl
    You can also search for this author inPubMed Google Scholar
  20. Thomas Denize
    You can also search for this author inPubMed Google Scholar
  21. David Meredith
    You can also search for this author inPubMed Google Scholar
  22. Keith L. Ligon
    You can also search for this author inPubMed Google Scholar
  23. Sabina Signoretti
    You can also search for this author inPubMed Google Scholar
  24. Shuji Ogino
    You can also search for this author inPubMed Google Scholar
  25. Jeffrey A. Golden
    You can also search for this author inPubMed Google Scholar
  26. MacLean P. Nasrallah
    You can also search for this author inPubMed Google Scholar
  27. Xiao Han
    You can also search for this author inPubMed Google Scholar
  28. Sen Yang
    You can also search for this author inPubMed Google Scholar
  29. Kun-Hsing Yu
    You can also search for this author inPubMed Google Scholar

Contributions

X.W., J. Zhao, S.Y. and K.-H.Y. conceived and designed the study. J. Zhao, E.M., D.D., N.U.L., L.S., T.D., D.M., K.L.L., S.S., S.O., J.A.G., M.P.N., K.-H.Y., F.W., H.T., Jing Zhang, K.W. and Y.L. curated the data from their respective institutes. X.W., J. Zhao, S.Y., W.Y., Jiayu Zhang and K.-H.Y. developed, validated and evaluated the models. J.J., F.W., K.W., Y.L., Y.P., J. Zhu, C.R.J., J.A.G., M.P.N. and K.-H.Y. interpreted the pathological images. Jun Zhang, Jing Zhang, X.H. and R.L. contributed to the technical discussion. X.W., J. Zhao, E.M., C.R.J., J.A.G, J.J., F.W., S.Y. and K.-H.Y interpreted the analytical results. X.W., J. Zhao, S.Y. and K.-H.Y wrote the manuscript. All authors contributed to the edits of the manuscript. K.-H.Y supervised the project.

Corresponding authors

Correspondence toSen Yang or Kun-Hsing Yu.

Ethics declarations

Competing interests

Jun Zhang and X.H. were employees of Tencent AI Lab. K.-H.Y. is an inventor on US patent 16/179,101 (patent assigned to Harvard University) and was a consultant for Curatio.DL (not related to this work). K.L.L. was a consultant for Travera, BMS, Servier, Integragen, LEK and Blaze Bioscience, received equity from Travera, and has research funding from BMS and Lilly (not related to this work). C.R.J is an inventor on US patent applications 17/073,123 and 63/528,496 (patents assigned to Dartmouth Hitchcock Medical Center and ViewsML) and is a consultant and CSO for ViewsML, none of which is related to this work.

Peer review

Peer review information

Nature thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 CHIEF accurately identified the origins of tumors, with results validated in independent patient cohorts from the Clinical Proteomic Tumor Analysis Consortium (CPTAC).

a. The confusion matrix of CHIEF’s prediction in the held-out test sets. The overall macro-averaged accuracy of CHIEF is 0.895. b. CHIEF achieved high prediction performance and generalizability to independent cohorts in tumor origin prediction (AUROC = 0.9853 ± 0.0245). Micro-averaged one-versus-rest ROC curves for tumor origin classification are shown. We presented the AUROC±s.d. calculated across 18 tumor origins. In comparison, state-of-the-art methods have substantially lower performance in the independent cohorts (two-sided Wilcoxon signed-rank test P-value = 0.000015). c. CHIEF attained higher accuracy than state-of-the-art deep learning methods in tumor origin prediction. Overall accuracies for the held-out (n = 1,895) and independent test sets (n = 3,019) for CHIEF and other deep learning methods are shown. d. CHIEF attained higher AUROC, sensitivity, and specificity for each tumor origin in the held-out test sets (n = 1,895) compared with other methods. The model performance for all 18 tumor origins is shown. e. CHIEF possessed significantly higher AUROC, sensitivity, and specificity for each origin in the independent test sets (n = 3,019, P-value = 0.003906, two-sided Wilcoxon signed-rank test). In contrast, standard machine learning approaches suffer from substantial performance drops when applied to patient cohorts not involved in model development. In c-e, error bars represent 95% confidence intervals computed by the bootstrap method (n = 1,000 replicates), and the centers represent the values of various performance metrics specified in these figure panels. The detailed sample size for each cancer type shown in d-e can be found in Supplementary Table 14.

Source Data

Extended Data Fig. 2 Visualization of model attention scores showed CHIEF accurately identified cancerous regions of melanoma, lung, and kidney cancers.

For each cancer type, the left image panel represented the ground truth annotations labeled by experienced pathologists. Because CHIEF employs a weakly supervised approach that only requires slide-level annotations, these region-level annotations were not used during the training phase. The middle panel visualized the amount of attention CHIEF paid to each region in the WSIs. The right panel showed the zoomed-in view of regions receiving high (image tiles with red outlines) and low (image tiles with black outlines) attention scores. The original WSIs and their corresponding heatmaps are available at https://yulab.hms.harvard.edu/projects/CHIEF/CHIEF.htm.

Extended Data Fig. 3 Detailed genetic mutation prediction results organized by cancer types.

Prediction performance of prevalent genetic mutations (n = 11,483) and targeted-therapy-associated genetic mutations (n = 6,013) is shown. The detailed sample counts for each genetic mutation are available in Supplementary Tables 17, 18. CHIEF predicted several prevalent mutations (e.g., TP53 in ACC, LGG, and UCEC) with AUROCs > 0.80. The mean ± 95% confidence interval is shown for each prediction task. Error bars represent the 95% confidence intervals estimated by 5-fold cross-validation (5 independent runs).

Source Data

Extended Data Fig. 4 CHIEF attained a high performance in predicting genetic mutation status from histopathology images across cancer types.

Prediction performance in the held-out test set (TCGA) and independent test set (CPTAC) were shown side by side. These results were grouped by the genes to highlight the prediction performance of the same genes across cancer types. The red and blue horizontal lines represent the average AUROCs in the held-out and independent test sets, respectively. Top, CHIEF’s performance in predicting mutation status for frequently mutated genes across cancer types. Supplementary Tables 17 and 19 show the detailed sample count for each cancer type. Bottom, CHIEF’s performance in predicting genetic mutation status related to FDA-approved targeted therapies. Supplementary Tables 18 and 20 show the detailed sample count for each cancer type. In a and b, results are presented as mean ± 95% confidence interval. Error bars represent the 95% confidence intervals estimated by 5-fold cross-validation.

Source Data

Extended Data Fig. 5 CHIEF predicted IDH status of glioma samples in several patient cohorts.

CHIEF classified glioma samples with and without IDH mutation. Here, we showed that CHIEF successfully predicted IDH mutation status in both high and low histological grade groups defined by conventional visual-based histopathology assessment. a. Regions with increased cellularity and perinuclear halos received high model attention in IDH-mutant samples, while regions showing poorer cell adhesion received high attention in IDH-wildtype slides. We used samples from the MUV-GBM dataset as an example for this visualization. The bottom figures show the corresponding image tiles. Six experienced pathologists (see Methods) examined these tiles independently and annotated the morphological patterns correlated with regions receiving high and low attention. b. IDH-mutant gliomas from the six cohorts exhibit a similar bi-modal distribution along the attention score axis. In contrast, IDH-wildtype gliomas display an unimodal distribution with mostly low-attention image regions. We normalized the attention scores to a range from 0 to 1, representing the importance of each image tile to the prediction output by CHIEF. These analyses included samples from TCGA-GBM (n = 834), MUV-GBM (n = 507), HMS-GBM (n = 88), TCGA-LGG (n = 842), MUV-LGG (n = 365), and HMS-LGG (n = 82). In these violin plots, the central white dots represent the median, the thick black bars indicate the interquartile range (IQR), and the thin black lines (whiskers) extend to 1.5 times the IQR from the first and third quartiles. The width of the violin represents the density of data at different values.

Source Data

Extended Data Fig. 6 CHIEF predicted MSI status in several colorectal cancer patient cohorts.

a. Solid tumor regions of MSI-high samples received high attention scores, while adjacent benign mucosal epithelium regions received low attention scores. In MSI-low samples, most regions received low attention scores. Example images from the PAIP2020 dataset were shown in this visualization. The bottom portion of this figure panel showed image tiles receiving high and low attention scores. Malignant regions were highly attended in both MSI-low and MSI-high samples. Solid tumors, intraluminal and extraluminal mucin, and signet ring cells received high attention in MSI-high samples. In MSI-low samples, infiltrative malignant glands interfacing with fibroblasts, luminal necrosis, and lymphocytic infiltrates received relatively high attention. Adjacent benign colonic epithelium receives low attention in both MSI-high and MSI-low patients. b. CHIEF paid high levels of attention to 30% of regions in MSI-high samples, while more regions in MSI-low samples received low attention scores. Attention score distributions of the three patient cohorts (n = 437 in TCGA-COADREAD, n = 77 in PAIP2020, and n = 221 in CPTAC-COAD) are shown. In these violin plots, the central white dots represent the median, the thick black bars indicate the interquartile range (IQR), and the thin black lines (whisker) extend to 1.5 times the IQR from the first and third quartiles. The width of the violin represents the density of data at different values.

Source Data

Extended Data Fig. 7 Survival prediction results for patients with all stages.

Previous methods pooled patients with all stages in their survival outcome prediction12,62,63. To facilitate comparisons with these previous reports, we compared CHIEF with baseline methods in this study setting, using 9,404 whole slide images from 6,464 patients. CHIEF attained substantially better survival prediction performance (unadjusted two-sided log-rank test P-value < 0.05 in all patient cohorts under study) and distinguished patients with different survival outcomes using histopathology images alone. Supplementary Fig. 5 shows results from two baseline methods (PORPOISE and DSMIL). Error bands represent 95% confidence intervals.

Source Data

Extended Data Fig. 8 Visualization of model attention showed regions of importance in survival prediction for lung cancer patients.

In patients with shorter-term survival, CHIEF paid high levels of attention to lesional regions with high tumor cellularity and strands of fibrosis in lung adenocarcinoma, tumor budding in squamous cell carcinoma, and necrotic regions in both types of lung cancers. In contrast, highly attended regions in patients with lower mortality risks highlighted dyskeratosis in lung squamous cell carcinoma. The original WSIs and their corresponding heatmaps are available at https://yulab.hms.harvard.edu/projects/CHIEF/CHIEF_survival.htm.

Extended Data Fig. 9 Quantitative analyses of regions receiving high attention revealed pathology microenvironments predictive of molecular profiles and survival outcomes.

For each WSI, we selected the top 1% of patches with the highest attention from CHIEF at 40× magnification. We excluded WSIs with fewer than 100 image patches. We employed Hover-Net64 trained with pathologists’ annotations in the PanNuke dataset (including tumor cells, lymphocytes, stromal cells, necrotic cells, and epithelial cells) for cell segmentation and classification. We compared the cell type compositions across different patient groups. a. Colorectal cancer samples with MSI-high status have significantly more tumor-infiltrating lymphocytes in the high-attention regions (unadjusted two-sided Mann-Whitney U test P-value = 0.00052 in PAIP2020, P-value = 0.00016 in CPTAC-COAD). b. IDH wild-type glioma samples have significantly more necrotic cells (unadjusted two-sided Mann-Whitney U test P-value = 0.00006 in TCGA-GBM and P-value = 0.000001 in TCGA-LGG). c. Samples from longer-term colorectal cancer survivors have a larger number of stromal cells, more tumor-infiltrating lymphocytes, and fewer tumor cells in the high-attention regions, compared with those with shorter-term survival. Samples from shorter-term lung squamous cell carcinoma survivors have a larger fraction of tumor cells and smaller fractions of lymphocytes and epithelial cells in the high-attention regions, compared with those with longer-term survival. These analyses included samples from PAIP2020 (n = 77), CPTAC-COAD (n = 221), TCGA-GBM (n = 825), TCGA-LGG (n = 834), TCGA-COADREAD (n = 520), and TCGA-LUSC (n = 400). In these box plots, the central lines indicate the median, box bounds are the 25th and 75th percentiles, and whiskers extend to 1.5 times the interquartile range. In these figures, one star (*), two stars (**), three stars (***), and four stars (****) represent P-value < 0.05, P-value < 0.01, P-value < 0.001, and P-value < 0.0001, respectively.

Source Data

Supplementary information

Source data

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, X., Zhao, J., Marostica, E. et al. A pathology foundation model for cancer diagnosis and prognosis prediction.Nature (2024). https://doi.org/10.1038/s41586-024-07894-z

Download citation