A pathology foundation model for cancer diagnosis and prognosis prediction (original) (raw)

Data availability

This work utilized 16 pathology datasets from large research consortia, including TCGA (https://portal.gdc.cancer.gov), GTEx (https://www.gtexportal.org/home/), PAIP (http://www.wisepaip.org/paip), PANDA (https://www.kaggle.com/c/prostate-cancer-grade-assessment), BCC (https://datahub.aida.scilifelab.se/10.23698/aida/bccc), ACROBAT (https://doi.org/10.48723/w728-p041), BCNB (https://bcnb.grand-challenge.org/), TOC (https://www.cancerimagingarchive.net/collection/ovarian-bevacizumab-response/), CPTAC (https://portal.gdc.cancer.gov), DROID-Breast (https://datahub.aida.scilifelab.se/10.23698/aida/drbr), Dataset-PT (https://github.com/CSU-BME/pathology_SSL), Diagset-B (https://github.com/michalkoziarski/DiagSet), MUV (https://doi.org/10.25493/WQ48-ZGX) and PLCO (https://cdas.cancer.gov/plco/). The other two datasets, PAIP2020 and TissueNet, can be requested from the respective data science challenge organizers: PAIP2020 (https://paip2020.grand-challenge.org/) and TissueNet (https://www.drivendata.org/competitions/67/competition-cervical-biopsy/). Supplementary Table 22 provides the links to the raw data from these sources. We obtained institutional data for CHIEF pretraining and validation from DFCI, BWH, YH, SMCH, CUCH and the Hospital of the University of Pennsylvania. These data are not publicly available owing to patient privacy obligations and institutional review board and data use agreement requirements. Researchers may obtain de-identified data directly from DFCI, BWH, YH, SMCH, CUCH and the Hospital of the University of Pennsylvania by reasonable request and subject to institutional ethical approvals. Data access enquiries should be directed to K.-H.Y. We aim to forward all requests to the managers of these institutional datasets in 2 weeks, and these requests will be evaluated according to their institutional policies. Data are strictly only for non-commercial academic use. This study relies on retrospective analysis of anonymized pathology slides. Source data are provided with this paper.

Code availability

All code was implemented in Python using PyTorch as the primary deep learning package. The source codes for CHIEF are available at https://github.com/hms-dbmi/CHIEF. Our Docker images are available at https://hub.docker.com/r/chiefcontainer/chief.

References

Van der Laak, J., Litjens, G. & Ciompi, F. Deep learning in histopathology: the path to the clinic. Nat. Med. 27, 775–784 (2021).
Article PubMed Google Scholar
Shmatko, A., Ghaffari Laleh, N., Gerstung, M. & Kather, J. N. Artificial intelligence in histopathology: enhancing cancer research and clinical oncology. Nat. Cancer 3, 1026–1038 (2022).
Article PubMed Google Scholar
Song, A. H. et al. Artificial intelligence for digital and computational pathology. Nat. Rev. Bioeng. 1, 930–949 (2023).
Article Google Scholar
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).
Article CAS PubMed PubMed Central Google Scholar
Bejnordi, B. E. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).
Article Google Scholar
Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
Article PubMed PubMed Central Google Scholar
Coudray, N. et al. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).
Article CAS PubMed PubMed Central Google Scholar
Nasrallah, M. P. et al. Machine learning for cryosection pathology predicts the 2021 WHO classification of glioma. Med 4, 526–540 (2023).
Article PubMed Google Scholar
Tsai, P.-C. et al. Histopathology images predict multi-omics aberrations and prognoses in colorectal cancer patients. Nat. Commun. 14, 2102 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Yu, K.-H. et al. Classifying non-small cell lung cancer types and transcriptomic subtypes using convolutional neural networks. J. Am. Med. Inform. Assoc. 27, 757–769 (2020).
Article PubMed PubMed Central Google Scholar
Yu, K.-H. et al. Association of omics features with histopathology patterns in lung adenocarcinoma. Cell Syst. 5, 620–627 (2017).
Article CAS PubMed PubMed Central Google Scholar
Chen, R. J. et al. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 40, 865–878 (2022).
Article CAS PubMed PubMed Central Google Scholar
Marostica, E. et al. Development of a histopathology informatics pipeline for classification and prediction of clinical outcomes in subtypes of renal cell carcinoma. Clin. Cancer Res. 27, 2868–2878 (2021).
Article CAS PubMed Google Scholar
Yu, K.-H. et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat. Commun. 7, 12474 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Vanguri, R. S. et al. Multimodal integration of radiology, pathology and genomics for prediction of response to PD-(L)1 blockade in patients with non-small cell lung cancer. Nat. Cancer 3, 1151–1164 (2022).
Article CAS PubMed PubMed Central Google Scholar
Yu, K.-H. et al. Deciphering serous ovarian carcinoma histopathology and platinum response by convolutional neural networks. BMC Med. 18, 236 (2020).
Article CAS PubMed PubMed Central Google Scholar
Foersch, S. et al. Multistain deep learning for prediction of prognosis and therapy response in colorectal cancer. Nat. Med. 29, 430–439 (2023).
Article CAS PubMed Google Scholar
Kather, J. N. et al. Pan-cancer image-based detection of clinically actionable genetic alterations. Nat. Cancer 1, 789–799 (2020).
Article CAS PubMed PubMed Central Google Scholar
Echle, A. et al. Deep learning in cancer pathology: a new generation of clinical biomarkers. Br. J. Cancer 124, 686–696 (2021).
Article PubMed Google Scholar
Ektefaie, Y. et al. Integrative multiomics-histopathology analysis for breast cancer classification. NPJ Breast Cancer 7, 147 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yu, K.-H., Beam, A. L. & Kohane, I. S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2, 719–731 (2018).
Article PubMed Google Scholar
Krishnan, R., Rajpurkar, P. & Topol, E. J. Self-supervised learning in medicine and healthcare. Nat. Biomed. Eng. 6, 1346–1352 (2022).
Article PubMed Google Scholar
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–1638 (2023).
Chen, C. et al. Fast and scalable search of whole-slide images via self-supervised deep learning. Nat. Biomed. Eng. 6, 1420–1434 (2022).
Article PubMed PubMed Central Google Scholar
Wang, X. et al. RetCCL: clustering-guided contrastive learning for whole-slide image retrieval. Med. Image Anal. 83, 102645 (2023).
Article PubMed Google Scholar
Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).
Wagner, S. J. et al. Transformer-based biomarker prediction from colorectal cancer histology: a large-scale multicentric study. Cancer Cell 41, 1650–1661 (2023).
Article CAS PubMed PubMed Central Google Scholar
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical twitter. Nat. Med. 29, 2307–2316 (2023).
Article CAS PubMed Google Scholar
Lu, M. Y. et al. A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024).
Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022).
Article PubMed Google Scholar
Koziarski, M. et al. Diagset: a dataset for prostate cancer histopathological image classification. Sci. Rep. 14, 6780 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Yu, G. et al. Accurate recognition of colorectal cancer with semi-supervised deep learning on pathological images. Nat. Commun. 12, 6311 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Loménie, N. et al. Can AI predict epithelial lesion categories via automated analysis of cervical biopsies: the TissueNet challenge? J. Pathol. Inform. 13, 100149 (2022).
Article PubMed PubMed Central Google Scholar
Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2127–2136 (PMLR, 2018).
Li, B., Li, Y. & Eliceiri, K. W. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In Proc. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition 14313–14323 (IEEE, 2021).
Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat. Cancer 1, 800–810 (2020).
Article CAS PubMed Google Scholar
Petrini, I. et al. A specific missense mutation in GTF2I occurs at high frequency in thymic epithelial tumors. Nat. Genet. 46, 844–849 (2014).
Article CAS PubMed PubMed Central Google Scholar
Carbone, M. et al. Biological mechanisms and clinical significance of BAP1 mutations in human cancer. Cancer Discov. 10, 1103–1120 (2020).
Article CAS PubMed PubMed Central Google Scholar
Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precision Oncology 1, 1–16 (2017).
Article Google Scholar
Louis, D. N. et al. The 2021 WHO Classification of Tumors of the Central Nervous System: a summary. Neuro-Oncology 23, 1231–1251 (2021).
Article CAS PubMed PubMed Central Google Scholar
Roetzer-Pejrimovsky, T. et al. The Digital Brain Tumour Atlas, an open histopathology resource. Sci. Data 9, 55 (2022).
Article PubMed PubMed Central Google Scholar
Kim, K. et al. PAIP 2020: microsatellite instability prediction in colorectal cancer. Med. Image Anal. 89, 102886 (2023).
Article PubMed Google Scholar
Amin, M. B. et al. The Eighth Edition AJCC Cancer Staging Manual: continuing to build a bridge from a population-based to a more “personalized” approach to cancer staging. CA Cancer J. Clin. 67, 93–99 (2017).
Article PubMed Google Scholar
Achiam, J. et al. GPT-4 technical report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2023).
Team, G. et al. Gemini: a family of highly capable multimodal models. Preprint at https://doi.org/10.48550/arXiv.2312.11805 (2023).
Azizi, S. et al. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat. Biomed. Eng. 7, 756–779 (2023).
Cancer Genome Atlas Research Network, J. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
Article Google Scholar
Lonsdale, J. et al. The genotype-tissue expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Article CAS Google Scholar
Bulten, W. et al. Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge. Nat. Med. 28, 154–163 (2022).
Yacob, F. et al. Weakly supervised detection and classification of basal cell carcinoma using graph-transformer on whole slide images. Sci Rep. 13, 7555 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Xu, F. et al. Predicting axillary lymph node metastasis in early breast cancer using deep learning on primary tumor biopsy slides. Front. Oncol. 11, 4133 (2021).
Article Google Scholar
Weitz, P. et al. A multi-stain breast cancer histological whole-slide-image data set from routine diagnostics. Sci. Data 10, 562 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wang, C.-W. et al. Histopathological whole slide image dataset for classification of treatment effectiveness to ovarian cancer. Sci. Data 9, 25 (2022).
Article PubMed PubMed Central Google Scholar
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 8748–8763 (PMLR, 2021).
Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. Syst. 9, 62–66 (1979).
Article Google Scholar
Kingma, D. P. & Ba, J. L. Adam: a method for stochastic optimization. In Proc. 3rd International Conference on Learning Representations (eds Bengio, Y. & LeCun, Y.) (ICLR, 2015).
Loshchilov, I. & Hutter, F. SGDR: stochastic gradient descent with warm restarts. In Proc. 5th International Conference on Learning Representations 1769–1784 (ICLR, 2017).
Stadler, C. B. et al. Proactive construction of an annotated imaging database for artificial intelligence training. J. Digit. Imaging 34, 105–115 (2021).
Article PubMed Google Scholar
Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).
Article ADS CAS PubMed Google Scholar
Black, A. et al. PLCO: evolution of an epidemiologic resource and opportunities for future studies. Rev. Recent Clin. Trials 10, 238–245 (2015).
Article PubMed PubMed Central Google Scholar
Shao, Z. et al. TransMIL: Transformer based correlated multiple instance learning for whole slide image classification. Adv. Neural Inf. Process. Syst. 34, 2136–2147 (2021).
Google Scholar
Liang, J. et al. Deep learning supported discovery of biomarkers for clinical prognosis of liver cancer. Nat. Mach. Intell. 5, 408–420 (2023).
Article Google Scholar
Courtiol, P. et al. Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nat. Med. 25, 1519–1525 (2019).
Article CAS PubMed Google Scholar
Graham, S. et al. Hover-Net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 58, 101563 (2019).
Article PubMed Google Scholar

Download references

Acknowledgements

We thank C. Burroughs, M. Kapanadze and F. McDonald for administrative support; and the AWS Cloud Credits for Research programme, the Microsoft Azure for Research Award programme, the NVIDIA GPU Grant Program and the Extreme Science and Engineering Discovery Environment at the Pittsburgh Supercomputing Center (allocation TGBCS180016) for computational support. K.-H.Y. is in part supported by the National Institute of General Medical Sciences grant R35GM142879, the Department of Defense Peer Reviewed Cancer Research Program Career Development Award HT9425-231-0523, the Research Scholar Grant RSG-24-1253761-01-ESED (grant DOI: https://doi.org/10.53354/ACS.RSG-24-1253761-01-ESED.pc.gr.193749) from the American Cancer Society, a Google Research Scholar Award, the Harvard Medical School Dean’s Innovation Award and the Blavatnik Center for Computational Biomedicine Award. K.L.L. is in part supported by the National Institutes of Health award P50CA165962 and the 3000 Miles to the Cure Foundation. The PAIP data were provided by the Seoul National University Hospital and funded by the Ministry of Health and Welfare, Republic of Korea (grant number HI18C0316).

Author information

Author notes

These authors contributed equally: Xiyue Wang, Junhan Zhao

Authors and Affiliations

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Xiyue Wang, Junhan Zhao, Eliana Marostica, Christopher R. Jackson, Sen Yang & Kun-Hsing Yu
Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA
Xiyue Wang, Ruijiang Li & Sen Yang
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Junhan Zhao
Division of Health Sciences and Technology, Harvard-Massachusetts Institute of Technology, Boston, MA, USA
Eliana Marostica
College of Biomedical Engineering, Sichuan University, Chengdu, China
Wei Yuan, Jiayu Zhang & Jing Zhang
Department of Pathology, Sun Yat-sen University Cancer Center, Guangzhou, China
Jietian Jin
Department of Pathology, Shenzhen Maternity & Child Healthcare Hospital, Shenzhen, China
Hongping Tang
Department of Radiation Oncology, Chongqing University Cancer Hospital, Chongqing, China
Kanran Wang
Department of Pathology, Chongqing University Cancer Hospital, Chongqing, China
Yu Li
Department of Pathology, The Affiliated Yantai Yuhuangding Hospital of Qingdao University, Yantai, China
Fang Wang
Department of Pathology, The First Affiliated Hospital of Jinan University, Guangzhou, China
Yulong Peng
Department of Burn, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
Junyou Zhu
Department of Pathology and Laboratory Medicine, Pennsylvania State University, Hummelstown, PA, USA
Christopher R. Jackson
Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
Christopher R. Jackson
Tencent AI Lab, Shenzhen, China
Jun Zhang & Xiao Han
Department of Pathology, Brigham and Women’s Hospital, Boston, MA, USA
Deborah Dillon, Lynette Sholl, Thomas Denize, David Meredith, Keith L. Ligon, Sabina Signoretti, Shuji Ogino, Jeffrey A. Golden & Kun-Hsing Yu
Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
Nancy U. Lin
Department of Pathology, Dana-Farber Cancer Institute, Boston, MA, USA
Lynette Sholl, Thomas Denize, Keith L. Ligon & Sabina Signoretti
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Shuji Ogino
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Shuji Ogino
Department of Pathology, Cedars-Sinai Medical Center, Los Angeles, CA, USA
Jeffrey A. Golden
Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
MacLean P. Nasrallah
Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA
Kun-Hsing Yu

Authors

Xiyue Wang
Junhan Zhao
Eliana Marostica
Wei Yuan
Jietian Jin
Jiayu Zhang
Ruijiang Li
Hongping Tang
Kanran Wang
Yu Li
Fang Wang
Yulong Peng
Junyou Zhu
Jing Zhang
Christopher R. Jackson
Jun Zhang
Deborah Dillon
Nancy U. Lin
Lynette Sholl
Thomas Denize
David Meredith
Keith L. Ligon
Sabina Signoretti
Shuji Ogino
Jeffrey A. Golden
MacLean P. Nasrallah
Xiao Han
Sen Yang
Kun-Hsing Yu

Contributions

X.W., J. Zhao, S.Y. and K.-H.Y. conceived and designed the study. J. Zhao, E.M., D.D., N.U.L., L.S., T.D., D.M., K.L.L., S.S., S.O., J.A.G., M.P.N., K.-H.Y., F.W., H.T., Jing Zhang, K.W. and Y.L. curated the data from their respective institutes. X.W., J. Zhao, S.Y., W.Y., Jiayu Zhang and K.-H.Y. developed, validated and evaluated the models. J.J., F.W., K.W., Y.L., Y.P., J. Zhu, C.R.J., J.A.G., M.P.N. and K.-H.Y. interpreted the pathological images. Jun Zhang, Jing Zhang, X.H. and R.L. contributed to the technical discussion. X.W., J. Zhao, E.M., C.R.J., J.A.G, J.J., F.W., S.Y. and K.-H.Y interpreted the analytical results. X.W., J. Zhao, S.Y. and K.-H.Y wrote the manuscript. All authors contributed to the edits of the manuscript. K.-H.Y supervised the project.

Corresponding authors

Correspondence toSen Yang or Kun-Hsing Yu.

Ethics declarations

Competing interests

Jun Zhang and X.H. were employees of Tencent AI Lab. K.-H.Y. is an inventor on US patent 16/179,101 (patent assigned to Harvard University) and was a consultant for Curatio.DL (not related to this work). K.L.L. was a consultant for Travera, BMS, Servier, Integragen, LEK and Blaze Bioscience, received equity from Travera, and has research funding from BMS and Lilly (not related to this work). C.R.J is an inventor on US patent applications 17/073,123 and 63/528,496 (patents assigned to Dartmouth Hitchcock Medical Center and ViewsML) and is a consultant and CSO for ViewsML, none of which is related to this work.

Peer review

Peer review information

Nature thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 CHIEF accurately identified the origins of tumors, with results validated in independent patient cohorts from the Clinical Proteomic Tumor Analysis Consortium (CPTAC).

a. The confusion matrix of CHIEF’s prediction in the held-out test sets. The overall macro-averaged accuracy of CHIEF is 0.895. b. CHIEF achieved high prediction performance and generalizability to independent cohorts in tumor origin prediction (AUROC = 0.9853 ± 0.0245). Micro-averaged one-versus-rest ROC curves for tumor origin classification are shown. We presented the AUROC±s.d. calculated across 18 tumor origins. In comparison, state-of-the-art methods have substantially lower performance in the independent cohorts (two-sided Wilcoxon signed-rank test P-value = 0.000015). c. CHIEF attained higher accuracy than state-of-the-art deep learning methods in tumor origin prediction. Overall accuracies for the held-out (n = 1,895) and independent test sets (n = 3,019) for CHIEF and other deep learning methods are shown. d. CHIEF attained higher AUROC, sensitivity, and specificity for each tumor origin in the held-out test sets (n = 1,895) compared with other methods. The model performance for all 18 tumor origins is shown. e. CHIEF possessed significantly higher AUROC, sensitivity, and specificity for each origin in the independent test sets (n = 3,019, P-value = 0.003906, two-sided Wilcoxon signed-rank test). In contrast, standard machine learning approaches suffer from substantial performance drops when applied to patient cohorts not involved in model development. In c-e, error bars represent 95% confidence intervals computed by the bootstrap method (n = 1,000 replicates), and the centers represent the values of various performance metrics specified in these figure panels. The detailed sample size for each cancer type shown in d-e can be found in Supplementary Table 14.

Source Data

Extended Data Fig. 2 Visualization of model attention scores showed CHIEF accurately identified cancerous regions of melanoma, lung, and kidney cancers.

For each cancer type, the left image panel represented the ground truth annotations labeled by experienced pathologists. Because CHIEF employs a weakly supervised approach that only requires slide-level annotations, these region-level annotations were not used during the training phase. The middle panel visualized the amount of attention CHIEF paid to each region in the WSIs. The right panel showed the zoomed-in view of regions receiving high (image tiles with red outlines) and low (image tiles with black outlines) attention scores. The original WSIs and their corresponding heatmaps are available at https://yulab.hms.harvard.edu/projects/CHIEF/CHIEF.htm.

Extended Data Fig. 3 Detailed genetic mutation prediction results organized by cancer types.

Prediction performance of prevalent genetic mutations (n = 11,483) and targeted-therapy-associated genetic mutations (n = 6,013) is shown. The detailed sample counts for each genetic mutation are available in Supplementary Tables 17, 18. CHIEF predicted several prevalent mutations (e.g., TP53 in ACC, LGG, and UCEC) with AUROCs > 0.80. The mean ± 95% confidence interval is shown for each prediction task. Error bars represent the 95% confidence intervals estimated by 5-fold cross-validation (5 independent runs).

Source Data

Extended Data Fig. 4 CHIEF attained a high performance in predicting genetic mutation status from histopathology images across cancer types.

Prediction performance in the held-out test set (TCGA) and independent test set (CPTAC) were shown side by side. These results were grouped by the genes to highlight the prediction performance of the same genes across cancer types. The red and blue horizontal lines represent the average AUROCs in the held-out and independent test sets, respectively. Top, CHIEF’s performance in predicting mutation status for frequently mutated genes across cancer types. Supplementary Tables 17 and 19 show the detailed sample count for each cancer type. Bottom, CHIEF’s performance in predicting genetic mutation status related to FDA-approved targeted therapies. Supplementary Tables 18 and 20 show the detailed sample count for each cancer type. In a and b, results are presented as mean ± 95% confidence interval. Error bars represent the 95% confidence intervals estimated by 5-fold cross-validation.

Source Data

Extended Data Fig. 5 CHIEF predicted IDH status of glioma samples in several patient cohorts.

CHIEF classified glioma samples with and without IDH mutation. Here, we showed that CHIEF successfully predicted IDH mutation status in both high and low histological grade groups defined by conventional visual-based histopathology assessment. a. Regions with increased cellularity and perinuclear halos received high model attention in IDH-mutant samples, while regions showing poorer cell adhesion received high attention in IDH-wildtype slides. We used samples from the MUV-GBM dataset as an example for this visualization. The bottom figures show the corresponding image tiles. Six experienced pathologists (see Methods) examined these tiles independently and annotated the morphological patterns correlated with regions receiving high and low attention. b. IDH-mutant gliomas from the six cohorts exhibit a similar bi-modal distribution along the attention score axis. In contrast, IDH-wildtype gliomas display an unimodal distribution with mostly low-attention image regions. We normalized the attention scores to a range from 0 to 1, representing the importance of each image tile to the prediction output by CHIEF. These analyses included samples from TCGA-GBM (n = 834), MUV-GBM (n = 507), HMS-GBM (n = 88), TCGA-LGG (n = 842), MUV-LGG (n = 365), and HMS-LGG (n = 82). In these violin plots, the central white dots represent the median, the thick black bars indicate the interquartile range (IQR), and the thin black lines (whiskers) extend to 1.5 times the IQR from the first and third quartiles. The width of the violin represents the density of data at different values.

Source Data

Extended Data Fig. 6 CHIEF predicted MSI status in several colorectal cancer patient cohorts.

a. Solid tumor regions of MSI-high samples received high attention scores, while adjacent benign mucosal epithelium regions received low attention scores. In MSI-low samples, most regions received low attention scores. Example images from the PAIP2020 dataset were shown in this visualization. The bottom portion of this figure panel showed image tiles receiving high and low attention scores. Malignant regions were highly attended in both MSI-low and MSI-high samples. Solid tumors, intraluminal and extraluminal mucin, and signet ring cells received high attention in MSI-high samples. In MSI-low samples, infiltrative malignant glands interfacing with fibroblasts, luminal necrosis, and lymphocytic infiltrates received relatively high attention. Adjacent benign colonic epithelium receives low attention in both MSI-high and MSI-low patients. b. CHIEF paid high levels of attention to 30% of regions in MSI-high samples, while more regions in MSI-low samples received low attention scores. Attention score distributions of the three patient cohorts (n = 437 in TCGA-COADREAD, n = 77 in PAIP2020, and n = 221 in CPTAC-COAD) are shown. In these violin plots, the central white dots represent the median, the thick black bars indicate the interquartile range (IQR), and the thin black lines (whisker) extend to 1.5 times the IQR from the first and third quartiles. The width of the violin represents the density of data at different values.

Source Data

Extended Data Fig. 7 Survival prediction results for patients with all stages.

Previous methods pooled patients with all stages in their survival outcome prediction12,62,63. To facilitate comparisons with these previous reports, we compared CHIEF with baseline methods in this study setting, using 9,404 whole slide images from 6,464 patients. CHIEF attained substantially better survival prediction performance (unadjusted two-sided log-rank test P-value < 0.05 in all patient cohorts under study) and distinguished patients with different survival outcomes using histopathology images alone. Supplementary Fig. 5 shows results from two baseline methods (PORPOISE and DSMIL). Error bands represent 95% confidence intervals.

Source Data

Extended Data Fig. 8 Visualization of model attention showed regions of importance in survival prediction for lung cancer patients.

In patients with shorter-term survival, CHIEF paid high levels of attention to lesional regions with high tumor cellularity and strands of fibrosis in lung adenocarcinoma, tumor budding in squamous cell carcinoma, and necrotic regions in both types of lung cancers. In contrast, highly attended regions in patients with lower mortality risks highlighted dyskeratosis in lung squamous cell carcinoma. The original WSIs and their corresponding heatmaps are available at https://yulab.hms.harvard.edu/projects/CHIEF/CHIEF_survival.htm.

Extended Data Fig. 9 Quantitative analyses of regions receiving high attention revealed pathology microenvironments predictive of molecular profiles and survival outcomes.

For each WSI, we selected the top 1% of patches with the highest attention from CHIEF at 40× magnification. We excluded WSIs with fewer than 100 image patches. We employed Hover-Net64 trained with pathologists’ annotations in the PanNuke dataset (including tumor cells, lymphocytes, stromal cells, necrotic cells, and epithelial cells) for cell segmentation and classification. We compared the cell type compositions across different patient groups. a. Colorectal cancer samples with MSI-high status have significantly more tumor-infiltrating lymphocytes in the high-attention regions (unadjusted two-sided Mann-Whitney U test P-value = 0.00052 in PAIP2020, P-value = 0.00016 in CPTAC-COAD). b. IDH wild-type glioma samples have significantly more necrotic cells (unadjusted two-sided Mann-Whitney U test P-value = 0.00006 in TCGA-GBM and P-value = 0.000001 in TCGA-LGG). c. Samples from longer-term colorectal cancer survivors have a larger number of stromal cells, more tumor-infiltrating lymphocytes, and fewer tumor cells in the high-attention regions, compared with those with shorter-term survival. Samples from shorter-term lung squamous cell carcinoma survivors have a larger fraction of tumor cells and smaller fractions of lymphocytes and epithelial cells in the high-attention regions, compared with those with longer-term survival. These analyses included samples from PAIP2020 (n = 77), CPTAC-COAD (n = 221), TCGA-GBM (n = 825), TCGA-LGG (n = 834), TCGA-COADREAD (n = 520), and TCGA-LUSC (n = 400). In these box plots, the central lines indicate the median, box bounds are the 25th and 75th percentiles, and whiskers extend to 1.5 times the interquartile range. In these figures, one star (*), two stars (**), three stars (***), and four stars (****) represent P-value < 0.05, P-value < 0.01, P-value < 0.001, and P-value < 0.0001, respectively.

Source Data

Supplementary information

Source data

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, X., Zhao, J., Marostica, E. et al. A pathology foundation model for cancer diagnosis and prognosis prediction.Nature 634, 970–978 (2024). https://doi.org/10.1038/s41586-024-07894-z

Download citation

Received: 16 November 2023
Accepted: 01 August 2024
Published: 04 September 2024
Issue Date: 24 October 2024
DOI: https://doi.org/10.1038/s41586-024-07894-z