Molecular subtyping for clinically defined breast cancer subgroups - PubMed (original) (raw)

Molecular subtyping for clinically defined breast cancer subgroups

Xi Zhao et al. Breast Cancer Res. 2015.

Abstract

Introduction: Breast cancer is commonly classified into intrinsic molecular subtypes. Standard gene centering is routinely done prior to molecular subtyping, but it can produce inaccurate classifications when the distribution of clinicopathological characteristics in the study cohort differs from that of the training cohort used to derive the classifier.

Methods: We propose a subgroup-specific gene-centering method to perform molecular subtyping on a study cohort that has a skewed distribution of clinicopathological characteristics relative to the training cohort. On such a study cohort, we center each gene on a specified percentile, where the percentile is determined from a subgroup of the training cohort with clinicopathological characteristics similar to the study cohort. We demonstrate our method using the PAM50 classifier and its associated University of North Carolina (UNC) training cohort. We considered study cohorts with skewed clinicopathological characteristics, including subgroups composed of a single prototypic subtype of the UNC-PAM50 training cohort (n = 139), an external estrogen receptor (ER)-positive cohort (n = 48) and an external triple-negative cohort (n = 77).

Results: Subgroup-specific gene centering improved prediction performance with the accuracies between 77% and 100%, compared to accuracies between 17% and 33% from standard gene centering, when applied to the prototypic tumor subsets of the PAM50 training cohort. It reduced classification error rates on the ER-positive (11% versus 28%; P = 0.0389), the ER-negative (5% versus 41%; P < 0.0001) and the triple-negative (11% versus 56%; P = 0.1336) subgroups of the PAM50 training cohort. In addition, it produced higher accuracy for subtyping study cohorts composed of varying proportions of ER-positive versus ER-negative cases. Finally, it increased the percentage of assigned luminal subtypes on the external ER-positive cohort and basal-like subtype on the external triple-negative cohort.

Conclusions: Gene centering is often necessary to accurately apply a molecular subtype classifier. Compared with standard gene centering, our proposed subgroup-specific gene centering produced more accurate molecular subtype assignments in a study cohort with skewed clinicopathological characteristics relative to the training cohort.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Effect of estrogen receptor distribution on molecular subtype assignments. The University of North Carolina (UNC) cohort is the PAM50 training cohort. Only samples with available prototypic tumor subtypes and available estrogen receptor (ER) status are shown (n = 118). In each horizontal strip, the vertical bands represent individual patients and are arranged in the same sequence for each horizontal band. First, we considered the UNC cohort, where there was a balanced ER-positive to ER-negative distribution—46% ER-positive (54/118) and 54% ER-negative (64/118)—represented by the shaded pie chart labeled “UNC cohort.” In the first strip at the top, labeled “ER status”, the ER status on the UNC cohort is depicted as dark vs. light gray, representing ER-positive vs. ER-negative cases, respectively. In the second strip, labeled “Original subtype assignment,” the original subtype assignments on the UNC cohort are shown. Next, we considered a subset of the UNC cohort (n = 75), which we created by sampling ER-positive and ER-negative cases disproportionally, with 15% ER-positive (11/75) and 85% ER-negative (64/75), as represented by the pie chart labeled “UNC subset.” In the third strip, labeled “Standard gene centering,” assigned subtypes by standard gene centering on the subset of the UNC subset, where ER is disproportionally distributed, are shown. The misclassification rate is 33.3% (25/75) compared with the first 75 bands in the second strip. In the bottom strip, labeled “Subgroup-specific gene centering,” assigned subtypes by the proposed subgroup-specific gene centering on the subset of the UNC cohort, where ER is disproportionally distributed, are shown. The misclassification rate is 5.3% (4/75). Here the classification is similar to the actual classification, shown in the first 75 cases of the second strip, labeled “Original subtype assignment.” Her2, Human epidermal growth factor receptor 2; LumA, Luminal A; LumB, Luminal B.

Figure 2

Figure 2

Overview of subgroup-specific gene-centering algorithm. (a) Distribution of gene expression for a representative gene from the entire University of North Carolina (UNC) training cohort, with the global mean represented by the gray vertical dotted line. (b) The gene expression baseline is approximated by the global mean (gray dotted line) shown on the global distribution, represented as a mixture of estrogen receptor (ER)-positive cases (shown in pink) and ER-negative cases (shown in green). (c) and (d) The global median is located on different percentiles for the ER-positive and ER-negative cases, and each differs with respect to each subgroup mean. (e) The distribution of gene expression for the same gene in a study cohort composed of only ER-positive cases. The baseline value for subgroup-specific gene centering is estimated at the corresponding percentile of the ER-positive subgroup in the study cohort and compared with the median value, represented by the red vertical dotted line. The difference between these values is the error introduced by standard gene centering. (f) Similar to (e), but for the ER-negative subgroup.

Figure 3

Figure 3

Comparison of standard with subgroup-specific gene centering for predicting the individual molecular subtypes on the prototypic datasets. Bar plot represents the counts of the predicted subtype classes in individual prototypic tumor dataset. Her2, Human epidermal growth factor 2; LumA, Luminal A; LumB, Luminal B.

Figure 4

Figure 4

Comparison of various data transformation strategies for predicting molecular subtypes on study cohorts with varying estrogen receptor proportions. Datasets were constructed with percentages of estrogen receptor (ER)-positive cases ranging from of 0% to 100%. The ER-positive and ER-negative samples randomly drawn from the University of North Carolina set. Error rate is plotted against the composition with respect to ER for no, standard and subgroup-specific gene-centering strategies.

Similar articles

Cited by

References

    1. Perou CM, Sørlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, et al. Molecular portraits of human breast tumours. Nature. 2000;406:747–52. doi: 10.1038/35021093. - DOI - PubMed
    1. Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A. 2001;98:10869–74. doi: 10.1073/pnas.191367098. - DOI - PMC - PubMed
    1. Sørlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A. 2003;100:8418–23. doi: 10.1073/pnas.0932692100. - DOI - PMC - PubMed
    1. Hu Z, Fan C, Oh DS, Marron JS, He X, Qaqish BF, et al. The molecular portraits of breast tumors are conserved across microarray platforms. BMC Genomics. 2006;7:96. doi: 10.1186/1471-2164-7-96. - DOI - PMC - PubMed
    1. Perreard L, Fan C, Quackenbush JF, Mullins M, Gauthier NP, Nelson E, et al. Classification and risk stratification of invasive breast carcinomas using a real-time quantitative RT-PCR assay. Breast Cancer Res. 2006;8:R23. doi: 10.1186/bcr1399. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources