Derivation of prognostic contextual histopathological features from whole-slide images of tumours via graph deep learning (original) (raw)

Data availability

The TCGA WSI datasets, which were generated by the TCGA Research Network (, are publicly available through the Genomic Data Commons portal ( The NLST WSI datasets are available through the Cancer Imaging Archive (TCIA, The SNUH WSI datasets are not publicly available, in accordance with institutional requirements governing human-subject privacy protection. Source data are provided with this paper.

Code availability


We thank A. Choi and N. Kim for many helpful discussions and suggestions. S.K. received funding support for the publication of this study and for the research described in this study from the Ministry of Science and ICT (MSIT) of the Republic of Korea and the National Research Foundation of Korea (NRF-2020R1A3B3079653) and from the BK21 FOUR programme of the Education and Research Program for Future ICT Pioneers, Seoul National University in 2022. J.H.P. received funding support for the research described in this study from the Seoul Metropolitan Government Seoul National University (SMG-SNU) Boramae Medical Center (03-2020-18).

Author information

Author notes

  1. These authors contributed equally: Yongju Lee, Jeong Hwan Park, Sohee Oh, Kyoungseob Shin.

Authors and Affiliations

  1. Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea
    Yongju Lee, Kyoungseob Shin & Sunghoon Kwon
  2. Department of Pathology, Seoul National University College of Medicine, Seoul, Republic of Korea
    Jeong Hwan Park, Minsun Jung, Cheol Lee, Hyojin Kim, Jin-Haeng Chung & Kyung Chul Moon
  3. Department of Pathology, SMG-SNU Boramae Medical Center, Seoul, Republic of Korea
    Jeong Hwan Park
  4. Medical Research Collaborating Center, SMG-SNU Boramae Medical Center, Seoul, Republic of Korea
    Sohee Oh & Jiyu Sun
  5. Department of Pathology, Severance Hospital, Yonsei University College of Medicine, Seoul, Republic of Korea
    Minsun Jung
  6. Department of Pathology, Seoul National University Hospital, Seoul, Republic of Korea
    Cheol Lee & Kyung Chul Moon
  7. Department of Pathology and Translational Medicine, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
    Hyojin Kim & Jin-Haeng Chung
  8. Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, Republic of Korea
    Sunghoon Kwon
  9. Bio-MAX Institute, Seoul National University, Seoul, Republic of Korea
    Sunghoon Kwon
  10. BK21+ Creative Research Engineer Development for IT, Seoul National University, Seoul, Republic of Korea
    Sunghoon Kwon
  11. Biomedical Research Institute, Seoul National University, Seoul, Republic of Korea
    Sunghoon Kwon
  12. Institutes of Entrepreneurial BioConvergence, Seoul National University, Seoul, Republic of Korea
    Sunghoon Kwon


Y.L., J.H.P., S.O., K.S., K.C.M. and S.K. designed the experiments. Y.L. and K.S. wrote the code, performed the experiments and analysed the results. Y.L. designed and performed the deep-learning model. K.S. analysed the graphical features according to the context of the histopathological features. J.H.P. collected the ccRCC data at the SNUH and TCGA. H.K. and J.-H.C. reviewed the NLST cases and selected the risk-related pathological features. J.H.P. and K.C.M. reviewed the ccRCC cases. M.J. updated the patients’ metadata. C.L. provided the metastasis-related metadata. S.O. and J.S. analysed and reviewed the statistical model used in the study. S.K. and K.C.M. conceived the project. All authors contributed to the preparation of the manuscript.

Corresponding authors

Correspondence toKyung Chul Moon or Sunghoon Kwon.

Ethics declarations

Competing interests

Y.L., J.H.P., S.O., K.S., K.C.M. and S.K. are listed as inventors on patents (1020220029619) related to the work applied by the Seoul National University covering the technology. The other authors declare no competing interests.

Additional information

Extended data

Extended Data Fig. 1 Workflow of TEA-graph and WSI interpretation.

a, Pathologist’s workflow to decide the prognostic region on WSI. b, Workflow of TEA-graph to learn and interpret the context feature on WSI. c, Risk-visualized WSI and example of risk region detection using a connected graph.

Extended Data Fig. 2 Correlation measurement between the risk and IG value.

a, Numbers of patches in the low, mid, and high IG groups for each risk group. IQR of box plot is between Q1 and Q3 and center line indicates median value. Maxima is Q3 + 1.5*IQR and minima is Q1 – 1.5*IQR (n = 259 (Low risk, Mid risk), n = 223 (High risk)). b, Scatter plot between the risk and IG values. c, Merged scatter plot between the risk values and numbers of patches in each IG group. d-f, Scatter plots between the risk values and numbers of patches for each IG group. g, Kaplan-Meier plot analysis according to IG values quantized by 10% of entire IG values. P-values were calculated through two-sided log-rank test (n = 831).

Source data

Extended Data Fig. 3 Validation of the TEA-graph on the external NLST dataset.

a, Kaplan-Meier survival analysis using the TEA-graph predicted-risk value (right) and the original stage (left). P-values were calculated through two-sided log-rank test (n = 445). b, Number of patches belong to low, mid, and high IG group for each risk group. IQR of box plot is between Q1 and Q3 and center line indicates median value. Maxima is Q3 + 1.5*IQR and minima is Q1 – 1.5*IQR (n = 378 for each risk group). c, Merged scatter plot between the risk value and the number of patches belonging to each IG group. d, Predicted risk heat map of NLST patients. Scale bar, 4 mm e, Risk-related contextual features predicted by the TEA-graph. Scale bar, 400 μm.

Extended Data Fig. 4 Heterogeneous tumoral architecture features extracted by TEA-graph and effect of attention mechanism to extract the context features.

a, The node IG value is represented by the color of each node, and the edge attention score is represented by the color of each edge. Hemorrhagic cyst (top), patchy stromal hemorrhage (bottom). Scale bar, 100 μm (left), 400 μm (right). Patch size of last column is 80 μm. b, Portion of pairs that had low or high correlation within pairs that had high or low attention. c, Median feature correlation between two nodes connected with low or high attention edge within low, mid, and high IG group. (b-c), The p-value is calculated by two sample t-test (n = 944).

Source data

Extended Data Fig. 5 Histopathological prognosis feature comparison between the survival event and metastasis event.

a, Predicted risk heat map of two different events in the same patient who experienced metastasis of cancer to the lung. Scale bar, 4 mm. b, Pathological features of the connected graph that had a high IG value and appeared in both survival and metastasis events. Scale bar, 400 μm. c, Pathological features of the connected graph that had a high IG value and appeared predominantly in survival events. Scale bar, 400 μm.

Extended Data Fig. 6 Contextual pathological characteristics of high IG group.

a, Additional pathological images that represents patch-level cluster characteristics of the high IG group. Patch size is 80 μm. b, The edge distribution (connectivity) heat map of all subgraph cluster of high IG group. c, Graph-level Kaplan-Meier analysis of selected subgraph in graph cluster six in high IG group. P-values were calculated through two-sided log-rank test (n = 831). d, Additional example of pathological features of subgraphs that had high similarity with selected subgraph in graph cluster six in high IG group. Scale bar, 400 μm.

Source data

Extended Data Fig. 7 Contextual pathological characteristics of low IG group.

a, Additional pathological images that represents patch-level cluster characteristics of the low IG group. Patch size is 80 μm. b, The edge distribution (connectivity) heat map of all subgraph cluster of low IG group. c, Graph-level Kaplan-Meier analysis of selected subgraph in graph cluster three in low IG group. P-values were calculated through two-sided log-rank test (n = 831). d, Additional example of pathological features of subgraphs that had high similarity with selected subgraph in graph cluster three in low IG group. Scale bar, 400 μm.

a, Patch-level cluster characteristics of the high IG group. b, Plot of the difference between the areas under the curve of the low- and high-count Kaplan–Meier plots, which reflects the risk of the clusters (left), Kaplan-Meier plot of subgraph cluster three (right) (n = 514). c, Example of connected patch cluster in subgraph cluster three (left) and edge distribution (connectivity) of subgraph cluster three (right). The connectivity shows which patch clusters interact with each other more frequently. d, t-SNE plot of high IG subgraph clustered by k-means clustering method using graph features. e, Examples of the subgraph-level pathological features of subgraph cluster three. Scale bar, 400 μm. f, The edge distribution (connectivity) heat map of the other subgraph cluster of unfavorable (high IG) group. (a,c), Patch size is 80 μm.

a, Kaplan-Meier plot of subgraph cluster zero of high IG group (n = 831). b, Graph-level Kaplan-Meier analysis of selected subgraph related to the angiogenesis in subgraph cluster zero of high IG group (n = 831). c, Edge distribution (connectivity) of subgraph cluster zero and angiogenesis-related connectivity. d, Example of the subgraph-level angiogenesis-related pathological features (left) and connected patch cluster (right). Patch size is 80 μm. e, Additional example of pathological features of subgraphs that had high similarity with selected angiogenesis-related subgraph in graph cluster zero in high IG group. Scale bar, 400 μm. (a-b), P-values were calculated through two-sided log-rank test.

Source data

Extended Data Fig. 10 Pathological features of each IG group misclassified by TEA-graph.

a, False positive clusters of the low IG group, which have a small area under the curve value for the Kaplan–Meier plot. b-c, Pathological features of low IG group cluster five. d, False positive cluster of the high IG group, which has a negative area under the curve value for the Kaplan–Meier plot. e, Pathological features of high IG group cluster two. (b, c, e), Scale bar, 400 μm.

