Comprehensive Cross-Population Analysis of High-Grade Serous Ovarian Cancer Supports No More Than Three Subtypes - PubMed (original) (raw)

Comprehensive Cross-Population Analysis of High-Grade Serous Ovarian Cancer Supports No More Than Three Subtypes

Gregory P Way et al. G3 (Bethesda). 2016.

Abstract

Four gene expression subtypes of high-grade serous ovarian cancer (HGSC) have been previously described. In these early studies, a fraction of samples that did not fit well into the four subtype classifications were excluded. Therefore, we sought to systematically determine the concordance of transcriptomic HGSC subtypes across populations without removing any samples. We created a bioinformatics pipeline to independently cluster the five largest mRNA expression datasets using k-means and nonnegative matrix factorization (NMF). We summarized differential expression patterns to compare clusters across studies. While previous studies reported four subtypes, our cross-population comparison does not support four. Because these results contrast with previous reports, we attempted to reproduce analyses performed in those studies. Our results suggest that early results favoring four subtypes may have been driven by the inclusion of serous borderline tumors. In summary, our analysis suggests that either two or three, but not four, gene expression subtypes are most consistent across datasets.

Keywords: molecular subtypes; ovarian cancer; reproducibility; unsupervised clustering.

Copyright © 2016 Way et al.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Significance analysis of microarray (SAM) moderated t score Pearson correlation heatmaps reveal consistency across datasets. (A) Correlations across datasets for k means k = 2. (B) Correlations across datasets for k means k = 3. (C) Correlations across datasets for k means k = 4. TCGA, The Cancer Genome Atlas.

Figure 2

Figure 2

Significance analysis of microarray (SAM) moderated t score Pearson correlation heatmaps of clusters formed by k means clustering and NMF clustering reveals consistency across clustering methods. Within dataset results are shown for both methods when setting each algorithm to find 2, 3, and 4 clusters. NMF, nonnegative matrix factorization; TCGA, The Cancer Genome Atlas.

Figure 3

Figure 3

Comparing NMF consensus clustering in the Tothill dataset. Data displays consensus clustering for k = 2 to k = 6 for 10 NMF initializations alongside the cophenetic correlation results for k = 2 to k = 8. (A) Tothill dataset (n = 260) with borderline samples (n = 18) not removed prior to clustering. (B) Tothill dataset with borderline samples removed (n = 242).

Similar articles

Cited by

References

    1. Blagden S. P., 2015. Harnessing pandemonium: the clinical implications of tumor heterogeneity in ovarian cancer. Front. Oncol. 5: 149. - PMC - PubMed
    1. Boettiger C., 2015. An introduction to Docker for reproducible research. ACM SIGOPS Oper. Syst. Rev. 49: 71–79.
    1. Bonome T., Lee J.-Y., Park D.-C., Radonovich M., Pise-Masison C., et al. , 2005. Expression profiling of serous low malignant potential, low-grade, and high-grade tumors of the ovary. Cancer Res. 65: 10602–10612. - PubMed
    1. Bonome T., Levine D. A., Shih J., Randonovich M., Pise-Masison C. A., et al. , 2008. A gene signature predicting for survival in suboptimally debulked patients with ovarian cancer. Cancer Res. 68: 5478–5486. - PMC - PubMed
    1. Broad Institute TCGA Genome Data Analysis Center, 2016a Analysis overview for ovarian serous cystadenocarcinoma (primary solid tumor cohort) - 28 January 2016. Broad Institute of MIT and Harvard. DOI: 10.7908/C1VQ324T.

Publication types

MeSH terms

Grants and funding

LinkOut - more resources