From the Cover: Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival (original) (raw)

Proc Natl Acad Sci U S A. 2005 Mar 8; 102(10): 3738–3743.

From the Cover

Howard Y. Chang,a,b,c Dimitry S. A. Nuyten,c,d,e Julie B. Sneddon,b Trevor Hastie,f Robert Tibshirani,f Therese Sørlie,b,g Hongyue Dai,h,i Yudong D. He,h,i Laura J. van't Veer,d,i Harry Bartelink,e Matt van de Rijn,j Patrick O. Brown,b,k,l and Marc J. van de Vijverd,l

Howard Y. Chang

aProgram in Epithelial Biology, Departments of bBiochemistry, fHealth Research and Policy, and jPathology, and kHoward Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305; Departments of dDiagnostic Oncology and eRadiation Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands; hRosetta Inpharmatics, Seattle, WA 98109; and gNorwegian Radium Hospital, 0310 Oslo, Norway

Dimitry S. A. Nuyten

aProgram in Epithelial Biology, Departments of bBiochemistry, fHealth Research and Policy, and jPathology, and kHoward Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305; Departments of dDiagnostic Oncology and eRadiation Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands; hRosetta Inpharmatics, Seattle, WA 98109; and gNorwegian Radium Hospital, 0310 Oslo, Norway

Julie B. Sneddon

aProgram in Epithelial Biology, Departments of bBiochemistry, fHealth Research and Policy, and jPathology, and kHoward Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305; Departments of dDiagnostic Oncology and eRadiation Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands; hRosetta Inpharmatics, Seattle, WA 98109; and gNorwegian Radium Hospital, 0310 Oslo, Norway

Trevor Hastie

aProgram in Epithelial Biology, Departments of bBiochemistry, fHealth Research and Policy, and jPathology, and kHoward Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305; Departments of dDiagnostic Oncology and eRadiation Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands; hRosetta Inpharmatics, Seattle, WA 98109; and gNorwegian Radium Hospital, 0310 Oslo, Norway

Robert Tibshirani

aProgram in Epithelial Biology, Departments of bBiochemistry, fHealth Research and Policy, and jPathology, and kHoward Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305; Departments of dDiagnostic Oncology and eRadiation Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands; hRosetta Inpharmatics, Seattle, WA 98109; and gNorwegian Radium Hospital, 0310 Oslo, Norway

Therese Sørlie

aProgram in Epithelial Biology, Departments of bBiochemistry, fHealth Research and Policy, and jPathology, and kHoward Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305; Departments of dDiagnostic Oncology and eRadiation Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands; hRosetta Inpharmatics, Seattle, WA 98109; and gNorwegian Radium Hospital, 0310 Oslo, Norway

Hongyue Dai

aProgram in Epithelial Biology, Departments of bBiochemistry, fHealth Research and Policy, and jPathology, and kHoward Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305; Departments of dDiagnostic Oncology and eRadiation Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands; hRosetta Inpharmatics, Seattle, WA 98109; and gNorwegian Radium Hospital, 0310 Oslo, Norway

Yudong D. He

aProgram in Epithelial Biology, Departments of bBiochemistry, fHealth Research and Policy, and jPathology, and kHoward Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305; Departments of dDiagnostic Oncology and eRadiation Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands; hRosetta Inpharmatics, Seattle, WA 98109; and gNorwegian Radium Hospital, 0310 Oslo, Norway

Laura J. van't Veer

aProgram in Epithelial Biology, Departments of bBiochemistry, fHealth Research and Policy, and jPathology, and kHoward Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305; Departments of dDiagnostic Oncology and eRadiation Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands; hRosetta Inpharmatics, Seattle, WA 98109; and gNorwegian Radium Hospital, 0310 Oslo, Norway

aProgram in Epithelial Biology, Departments of bBiochemistry, fHealth Research and Policy, and jPathology, and kHoward Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305; Departments of dDiagnostic Oncology and eRadiation Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands; hRosetta Inpharmatics, Seattle, WA 98109; and gNorwegian Radium Hospital, 0310 Oslo, Norway

Matt van de Rijn

aProgram in Epithelial Biology, Departments of bBiochemistry, fHealth Research and Policy, and jPathology, and kHoward Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305; Departments of dDiagnostic Oncology and eRadiation Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands; hRosetta Inpharmatics, Seattle, WA 98109; and gNorwegian Radium Hospital, 0310 Oslo, Norway

Patrick O. Brown

aProgram in Epithelial Biology, Departments of bBiochemistry, fHealth Research and Policy, and jPathology, and kHoward Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305; Departments of dDiagnostic Oncology and eRadiation Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands; hRosetta Inpharmatics, Seattle, WA 98109; and gNorwegian Radium Hospital, 0310 Oslo, Norway

Marc J. van de Vijver

aProgram in Epithelial Biology, Departments of bBiochemistry, fHealth Research and Policy, and jPathology, and kHoward Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305; Departments of dDiagnostic Oncology and eRadiation Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands; hRosetta Inpharmatics, Seattle, WA 98109; and gNorwegian Radium Hospital, 0310 Oslo, Norway

aProgram in Epithelial Biology, Departments of bBiochemistry, fHealth Research and Policy, and jPathology, and kHoward Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305; Departments of dDiagnostic Oncology and eRadiation Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands; hRosetta Inpharmatics, Seattle, WA 98109; and gNorwegian Radium Hospital, 0310 Oslo, Norway

cH.Y.C. and D.S.A.N. contributed equally to this work.

iL.J.V. holds equity in Agendia, which has commercial interests in the 70-gene prognosis signature. H.D. and Y.D.H. hold shares of Merck.

Contributed by Patrick O. Brown, January 5, 2005

Copyright © 2005, The National Academy of Sciences

Freely available online through the PNAS open access option.

Supplementary Materials

Supporting Information

GUID: 9E64413C-CF64-4B1B-959F-212BB653B79B

GUID: C9ACA91C-9E4E-45B6-9FD5-08CDF557AD0C

GUID: 3977131A-3071-47B5-935C-999C911493EC

GUID: 2CC5B8CF-F998-4059-8DFF-70B378D79927

GUID: 8ABF7252-9D03-4E8F-968E-A1C9CDC863F4

GUID: B7D8D9E7-904F-4AEC-9250-2FF2743A76FF

GUID: 28CA890E-5578-4B07-B883-EAE9B96E3662

Abstract

Based on the hypothesis that features of the molecular program of normal wound healing might play an important role in cancer metastasis, we previously identified consistent features in the transcriptional response of normal fibroblasts to serum, and used this “wound-response signature” to reveal links between wound healing and cancer progression in a variety of common epithelial tumors. Here, in a consecutive series of 295 early breast cancer patients, we show that both overall survival and distant metastasis-free survival are markedly diminished in patients whose tumors expressed this wound-response signature compared to tumors that did not express this signature. A gene expression centroid of the wound-response signature provides a basis for prospectively assigning a prognostic score that can be scaled to suit different clinical purposes. The wound-response signature improves risk stratification independently of known clinico-pathologic risk factors and previously established prognostic signatures based on unsupervised hierarchical clustering (“molecular subtypes”) or supervised predictors of metastasis (“70-gene prognosis signature”).

Keywords: microarray, prognosis, wound healing, metastasis, treatment decision

In recent years, microarray analysis of gene expression patterns has provided a way to improve the diagnosis and risk stratification of many cancers (16). Unsupervised analysis of global gene expression patterns has identified molecularly distinct subtypes of cancer, distinguished by extensive differences in gene expression, in diseases that were considered homogeneous based on classical diagnostic methods (1, 3, 4, 7). The molecular subtypes are often associated with different clinical outcomes. Global gene expression pattern can also be examined for features that correlate with clinical behavior to create prognostic signatures (5, 8). For example in breast cancer, a poor prognosis gene expression signature in the primary tumor can accurately predict the risk of subsequent metastases, independent of other well known clinico-pathologic risk factors (9). However, because supervised methods are driven by class or outcome prediction, and the complexity of the models considered is necessarily limited, the resulting gene sets may be excellent prognostic markers without revealing much about the underlying biological mechanisms.

Gene expression patterns provide a common language among biologic phenomena and allow an alternative approach to infer physiologic and molecular mechanisms from complex human disease states (1, 10, 11, 12). Starting with the gene expression profile of cells manipulated in vitro to simulate a biologic process, the expression profile can then be used to interpret the gene expression data of human cancers and test specific hypotheses. To understand the similarities between wound healing and cancer, Chang et al. (10) identified a set of “core serum response” (CSR) genes and their canonical expression pattern in fibroblasts activated with serum, the soluble fraction of clotted blood and an important initiator of wound healing in vivo. The CSR genes were chosen to minimize overlap with cell cycle genes, but instead appeared to represent other important processes in wound healing, such as matrix remodeling, cell motility, and angiogenesis, processes that are likely also to contribute to cancer invasion and metastasis. In several common epithelial tumors such as breast, lung, and gastric cancers, expression of the wound-response signature predicted poor overall survival and increased risk of metastasis (10). These initial findings demonstrate the promise of using hypothesis-driven gene expression signatures to provide insights from existing gene expression profiles of cancers. However, as in other methodologies, reproducibility and scales for interpretation need to be evaluated before this strategy can be generally adopted for biologic discovery and clinical use.

The best validation of a gene signature's prognostic value is to test its ability to predict outcome in large independent data sets. Here we examine a database of 295 breast cancer patients from the Netherlands Cancer Institute that had previously been used to identify and validate a prognostic gene expression profile defined by a set of 70 genes (5, 9). We used this data set to test the reproducibility of the association between the wound-response signature and breast cancer progression, and to investigate how the information from diverse gene expression signatures identified by various means might be integrated both biologically and for clinical use.

Materials and Methods

Tumor Gene Expression Profiles. RNA isolation, labeling of complementary RNA, competitive hybridization of each tumor cRNA with pooled reference cRNA from all samples to 25,000 element oligonucleotide microarrays, and measurement of expression ratios were described (5). Detailed patient information has been described (9). Adjuvant setting was based on national guidelines or determined by participation in clinical trails at the time of diagnosis. Ten of the 151 patients who had lymph node-negative disease and 120 of the 144 who had lymph node-positive disease had received adjuvant systemic therapy consisting of chemotherapy (90 patients), hormonal therapy (20), or both (20). Clinical and gene expression data are available at http://microarrray-pubs.stanford.edu/wound_NKI, www.rii.com/publications, or http://microarrays.nki.nl.

Data Analysis. Prognostic signatures. Genes on Stanford cDNA microarrays and Rosetta/NKI oligonucleotide microarrays were mapped between different platforms by using Unigene identifiers (build 158, release date January 18, 2003). This older build of Unigene was used to allow comparison with two published cross-platform analyses (10, 13). In the unsupervised analysis, 295 tumor samples were grouped by similarity of the expression pattern of the CSR genes by average linkage clustering by using the software cluster (14); the gene expression values were centered by mean. The samples were segregated into two classes based on the first bifurcation of the clustering dendrogram; the two classes were identified as “activated” vs. “quiescent” by the predominant expression of the serum-induced and serum repressed CSR genes (10). Classification of the tumors as having a good prognosis signature or a poor prognosis signature based on the expression of 70 genes was as described (9). The five-class “intrinsic gene” signature was assigned by matching the expression value of the intrinsic genes in the NKI dataset to the nearest expression centroid of the five classes as described; samples that did not have correlation >0.1 to any centroid were termed unclassified (13). A total of 509 probes representing 431 of 487 intrinsic genes were successfully identified in the NKI data set.

Survival analysis. Overall survival was defined by death from any cause. Distant metastasis-free probability (DMFP) was defined by a distant metastasis as a first recurrence event; data on all patients were censored on the date of the last follow-up visit, death from causes other than breast cancer, the recurrence of local or regional disease, or the development of a second primary cancer, including contralateral breast cancer. Kaplan–Meier survival curves were compared by the Cox–Mantel log-rank test in winstat for excel (R. Fitch Software, Staufen, Germany). Multivariate analysis by the Cox proportional hazard method was performed by using the software package spss 11.5 (SPSS, Chicago).

Scaling the wound signature. The patient data set was randomized into two halves, one for training and one for testing. The two half sets were matched for all known clinical parameters and risk factors (Table 2, which is published as supporting information on the PNAS web site). The serum-activated fibroblast centroid was as described (10). Pearson correlation of the expression values of CSR genes of tumor samples to the serum-activated fibroblast centroid results in a quantitative score reflecting the wound-response signature for each sample. The higher the correlation value, the more the sample resembles serum-activated fibroblasts (“activated” wound-response signature). A negative correlation value indicates the opposite behavior and higher expression of the “quiescent” wound-response signature. The threshold for the two classes can be moved up or down from zero depending on the clinical goal. Sensitivity and specificity for predicting metastasis as the first recurrence event was calculated for every threshold between -1 and +1 for the correlation score in 0.05 increments. The threshold value of negative 0.15 correlation gave 90% sensitivity for metastasis prediction in the training set, and had equivalent performance in the test set.

Decision tree analysis. To construct a decision tree, we considered all clinical risk factors and gene expression profiles by using the Cox proportional hazard model in spss, identified the dominant risk factor (most significant P value) to segregate patients, and reiterated the process on each subgroup until the patients or risk factors became exhausted. For gene expression signatures, we used the correlation value to each canonical centroid as a continuous variable to capture the possibility that different thresholds may be optimal in different subgroups. Because 60 patients with lymph node-negative disease in this series were used to train the 70-gene signature (5), performance of the decision tree incorporating the 70-gene signature was validated on the independent subset of patients with lymph node-positive disease. The threshold for the 70-gene signature has been reported (5); the threshold for the wound-response signature was chosen based on outcome data in the training set. Performance of the decision tree analysis was validated by equal performance in the randomized training and testing sets of patients. Support of the decision tree model by nonlinear multivariate analysis is described in Fig. 4, which is published as supporting information on the PNAS web site.

Results

Prognostic Value of a Wound Response Gene Expression Signature in Breast Cancer. To validate the prognostic value of the wound-response signature, we examined the expression of the core serum response genes in 295 consecutive patients with early breast cancer treated at the Netherlands Cancer Institute. A total of 442 probes representing 380 of 459 core serum response genes were successfully identified in this data set. To determine whether the CSR genes showed coherent expression in this set of patients, we grouped the expression patterns of genes and patients by similarity using hierarchical clustering (14). As reported in two smaller groups of breast cancer patients (10), the CSR genes showed a coordinated and biphasic pattern of expression (Fig. 1_A_). Breast cancer samples showed predominant expression of either serum-induced or serum-repressed genes, allowing us to assign each sample to the activated or quiescent wound-response signature. We tested for association between the wound-response signature and the occurrence and timing of several key clinical outcomes. Patients with the activated wound-response signature (n = 126, 42.7%) had a significantly decreased distant metastasis-free probability (P = 8.6 × 10-6) and overall survival (P = 5.6 × 10-10) in univariate analysis (Fig. 1 B and C). We noted that two small subsets of patients within the quiescent group had more heterogeneous gene expression patterns (Fig. 1 A, yellow bars); these patients had an intermediate risk of metastasis and death from their tumors (Fig. 5, which is published as supporting information on the PNAS web site).

An external file that holds a picture, illustration, etc. Object name is zpq0070573970001.jpg

Performance of a “wound response” gene expression signature in predicting breast cancer progression. (A) Unsupervised hierarchical clustering of 295 breast cancer samples using 442 available CSR genes. Each row represents a gene; each column represents a sample. The level of expression of each gene, in each sample, relative to the mean level of expression of that gene across all of the samples, is represented by using a red–green color scale as shown in the key; gray indicates missing data. The transcriptional response of each gene in the fibroblast serum response is shown on the right bar (red indicates increased expression, and green indicates reduced expression in response to serum). The dendrogram at the top indicates the similarities among the samples in their expression of the CSR genes. Two main groups of tumors were observed: one group with a gene expression pattern similar to that of serum-activated fibroblasts, termed “activated,” and a second group with a reciprocal expression pattern of CSR genes, termed “quiescent.” Two small subsets of the quiescent group with more heterogeneous expression patterns are indicated by yellow bars. (B and C) Kaplan–Meier survival curves for the two classes of tumors. Patients with tumor expression the activated wound-response signature had worse overall survival (OS) and DMFP compared to those with a quiescent wound-response signature. A total of 126 tumors were classified as activated, and 169 tumors were classified as quiescent. For activated vs. quiescent groups, 10-year OS are 50% vs. 84% (P = 5.6 × 10-10) and 10-year DMFP are 51% vs. 75% (P = 8.6 × 10-6), respectively.

We extended the analysis by separately testing the association between the activated wound-response signature and clinical outcome in subsets of breast cancer patients: patients with tumors ≤2.0 cm (T1 tumors), patients with lymph node negative disease, and patients with lymph node positive disease. In each of these subsets of breast cancer patients, patients with tumors showing an activated wound-response signature had significantly worse distant metastasis-free probability and overall survival compared to those with a quiescent wound signature (Fig. 6, which is published as supporting information on the PNAS web site). These results confirm that the wound-response signature is a powerful prognostic indicator in breast cancer.

A Scalable Prognostic Score Based on the Wound-Response Signature. The previous analyses depended on stratifying tumors within a predefined group, relative to which each tumor is evaluated. To allow practical clinical use of the wound signature, we needed to develop a method to evaluate the presence and strength of this signature independently in any newly diagnosed cancer. Classification by hierarchical clustering provided a mathematically reasonable but biologically arbitrary threshold for assigning a cancer to one of two groups; it is preferable to treat the threshold as a parameter and quantify the confidence with which patients are assigned to each class. The threshold for calling a tumor sample wound-like could then be scaled to favor sensitivity or specificity, depending on the clinical scenario. For example, in a screening setting, it may be preferable to favor sensitivity, whereas a clinical test to determine therapies associated with high morbidity should have high specificity.

The expression pattern of CSR genes in serum-treated fibroblasts served as the prototype of the “activated” profile of the wound-response signature (10). Thus, we considered a strategy based on the correlation of the expression profile of CSR genes in each tumor sample to a vector representing the centroid of the differential expression in response to serum in cultured fibroblasts from 10 anatomic sites (10). The correlation value to the gene expression centroid of serum-activated fibroblast generates a continuous score that can be scaled. To evaluate the prognostic utility of the scalable wound signature, multivariate analysis of the wound signature with known clinical and pathologic risk factors for breast cancer progression showed that the wound signature is an independent predictor of metastasis and death and provides more prognostic information than any of the classic risk factors in the multivariate model (Table 1, hazard ratio of 7 and 11, respectively, P < 0.01). Because the expression pattern of CSR genes in serum-activated fibroblasts was discovered completely independently of tumor gene expression data or clinical outcome, the prognostic power of the serum-activated fibroblast centroid in breast cancer provides strong evidence of the biologic link between a wound response and cancer progression.

Table 1.

Multivariate analysis of risk factors for death and metastasis as the first recurrence event in early breast cancer

Death Metastasis
Hazard ratio (95% CI) P value Hazard ratio (95% CI) P value
Wound response signature 11.18 (2.52–49.6) 0.001 7.25 (1.75–30.0) 0.006
Age, per decade 0.66 (0.45–0.95) 0.027 0.71 (0.50–1.00) 0.052
Diameter of tumor, per cm 1.02 (0.98–1.04) 0.270 1.03 (1.01–1.06) 0.008
Lymph node status, per positive node 1.05 (0.94–1.16) 0.371 1.10 (1.01–1.21) 0.035
Tumor grade
Grade 2 vs. 1 2.86 (0.96–8.5) 0.059 1.87 (0.86–4.07) 0.117
Grade 3 vs. 1 3.14 (1.02–9.6) 0.045 1.70 (0.74–3.90) 0.212
Vascular invasion
1–3 vessels vs. 0 vessels 0.95 (0.35–2.52) 0.918 0.78 (0.32–1.87) 0.57
>3 vessels vs. 0 vessels 1.88 (1.13–3.11) 0.014 1.65 (1.02–2.68) 0.043
Estrogen receptor status, positive vs. negative 0.49 (0.29–0.83) 0.008 0.82 (0.47–1.41) 0.468
Mastectomy, vs. breast conserving therapy 1.23 (0.76–2.01) 0.401 1.28 (0.80–2.04) 0.311
No adjuvant therapy, vs. chemo or hormonal therapy 1.42 (0.80–2.52) 0.291 2.24 (1.32–3.82) 0.003

Incorporating the Wound-Response Signature for Improved Clinical Decision Making. Because the wound-response signature provides improved risk prediction compared to traditional criteria, we examined the utility of a scalable wound signature in a clinical scenario: the decision to treat with adjuvant chemotherapy in early breast cancer. Approximately 30% of women with early breast cancer have clinically occult metastatic disease, and treatment with chemotherapy in addition to surgical excision and radiotherapy improves their outcomes (15). Uniform treatment of early breast cancer in women younger than 50 years of age with chemotherapy increases the 10-year survival from 71% to 78% (absolute benefit of 7%) for lymph node negative disease and from 42% to 53% (absolute benefit of 11%) for lymph node positive disease, but at the cost of exposing a large number of women who do not benefit (89–93% of all breast cancer patients) to the morbidities of chemotherapy. The absolute benefit of chemotherapy for older patients is even smaller (3.3% for node-negative and 2.7% for node-positive patients) (15). Clinical parameters, such as lymph node status, tumor size, and histological grade, can provide prognostic information (16) and are summarized in commonly used clinical guides for deciding whether to treat with chemotherapy such as the National Institute of Health (NIH) (17) or St. Gallen (18) consensus criteria. Nonetheless, risk stratification based on clinical parameters is far from perfect and, as a result, many women who are unlikely to benefit are treated with chemotherapy.

Because the presence of the wound-response signature in the primary tumor is associated with an increased risk of subsequent metastasis, we used a scalable wound-response signature to identify a subset of patients with a predicted risk of subsequent metastasis of <10%. Within this low-risk population, the expected absolute benefit from chemotherapy would be very small and the decision to forego chemotherapy might be justified. We used the serum-activated fibroblast centroid to assign a correlation score to each tumor in the data set. We set a threshold for the correlation score that was able to identify 90% of all patients with subsequent metastasis; this threshold was validated by first learning the threshold in half of the samples and showing an equivalent performance in the remaining half of the data set.

We then tested whether this scaled wound-response signature provided improved risk stratification compared to traditional clinical criteria. Indeed, patients who were assigned as high risk by the NIH or St. Gallen consensus criteria had heterogeneous outcomes, and within these sets of conventional “high risk” patients, the supervised wound response score was able to identify a subset of patients with a low risk of subsequent metastasis (Fig. 2 A and B). A total of 185 of the patients represented in the NKI data set had never received adjuvant chemotherapy; the clinical outcomes of these patients allowed us to examine the appropriateness of decision for chemotherapy provided by the clinical guidelines or wound signature. As shown in Fig. 2_C_, the majority of patients who did not develop metastasis in this series were stratified as high risk by the NIH or St. Gallen criteria, and according to these criteria would have been treated with chemotherapy that would not benefit them. The wound-response signature appropriately identified 90% of patients who developed metastases as the first recurrence (the end point of the supervised scaling), and at the same time would have spared 30% of women who did not develop metastasis from exposure to chemotherapy. These results illustrate the potential utility and improved risk stratification that might be achieved by scaling the wound-response signature to fit the prognostic goals in a clinical setting.

An external file that holds a picture, illustration, etc. Object name is zpq0070573970002.jpg

A scalable wound-response signature as a guide for chemotherapy. (A) Wound-response signature adds prognostic information within the group of high-risk patients identified by NIH consensus criteria. According to the NIH criteria, 284 patients are high risk and advised to undergo adjuvant chemotherapy; 72 patients had tumor-positive lymph nodes. Patients were classified by using the serum activated fibroblast centroid (threshold = -0.15). The 10-years DMFP for the activated (n = 221) vs. quiescent (n = 61) is 58% vs. 83%, respectively (P = 0.0002). (B) Wound-response signature stratifies St. Gallen criteria high-risk patients. According to St. Gallen criteria, 271 patients are high risk and advised to undergo adjuvant treatment; 72 patients had tumor-positive lymph nodes. When the supervised wound signature was used, the 10-years DMFP for the activated (n = 217) vs. quiescent (n = 56) group is 59% vs. 83%, respectively (P = 0.0005). (C) Graphical representation of number of patients advised to undergo adjuvant systemic treatment and their eventual outcomes based on the supervised wound-response signature or the NIH or St. Gallen criteria in the 185 patients in this data set that did not receive adjuvant chemotherapy. Forty patients had tumor-positive lymph nodes. Yellow indicates chemotherapy, blue indicates no chemotherapy. The bar at left shows which patients have developed distant metastasis as first event: black indicates distant metastasis; white indicated no metastasis. Thus, blue in the lower bar indicates the potentially undertreated patients, yellow in the upper bar shows the potentially overtreated patients.

Integration of Diverse Gene Expression Signatures. How can we integrate the information from disparate prognostic signatures that have been identified for breast cancer to optimize risk stratification? We focused on three signatures that have been validated in independent studies and represent distinct analytic strategies. Perou et al. (3) used an unsupervised clustering strategy to identify subtypes of locally advanced breast tumors with pervasive differences in global gene expression patterns; the subtypes are thought to represent distinct biologic entities and were associated with different risk of metastasis (4, 13). At least five subtypes were distinguished, termed basal-like, ErbB2, luminal A, luminal B, and normal-like, and can be identified by the pattern of expression of a set of 500 “intrinsic genes.” In contrast, van't Veer et al. (5) selected a 70-gene signature based on the association of expression each gene with the likelihood of metastasis within 5 years. The 70-gene signature was trained on a subset of the 295 patients studied in the present work and validated on the entire group of 295 patients (9). Finally, the wound-response signature was identified in a hypothesis-initiated approach that specifically tested the relationship between tumor progression and a gene expression program identified in an experimental model of a wound response (10). Importantly, these prognostic signatures are defined by expression patterns of distinct sets of genes with little overlap: only 22 genes are shared by two signatures (18 of these genes were shared between wound response and the intrinsic gene list), and no gene is present in all three signatures.

We used each of the three signatures to evaluate this series of 295 breast tumors and found that, despite their different derivations, the signatures gave overlapping and generally consistent predictions of outcomes (Fig. 3_A_). Many primary tumors from patients that developed subsequent metastasis and died expressed both the 70-gene poor prognosis signature and the wound-response signature. A small group of tumors with poor outcome were not identified as having a poor prognosis by the 70-gene signature but were highlighted by the wound-response signature (right side of Fig. 3_A_). Similarly, almost all of the basal-like subgroup, so termed because they express markers characteristic of the basal epithelial cells in breast ducts, expressed the 70-gene poor prognosis signature and the activated wound-response signature (P < 0.001, χ2 test). These results thus strongly support the idea that the basal-like tumors represent a distinct disease entity with an aggressive clinical course. However, apart from the basal-like subtype, many tumors had expression patterns that were indeterminate with respect to the subtypes as defined by the intrinsic genes; >100 of the 295 tumors could not be confidently assigned to any of the five subtypes defined by Perou et al. (3) and Sørlie et al. (4) (Fig. 7, which is published as supporting information on the PNAS web site). The limited ability to classify these cancers based on the available data may be due to the incomplete representation of genes that define the intrinsic gene list in this data set, or due to the fact that the genes that define this classification system were identified in locally advanced breast cancer samples and may not be optimal for classifying earlier stage cancers. In multivariate analysis combining (additively) known clinical risk factors with all three signatures, the 70-gene signature and wound-response signature provided independent and significant prognostic information, whereas the intrinsic genes did not (i.e., their prognostic information was subsumed by the other parameters in the model; Tables 3 and 4 and Supporting Text, which are published as supporting information on the PNAS web site).

An external file that holds a picture, illustration, etc. Object name is zpq0070573970003.jpg

Integration of diverse gene expression signatures for risk prediction. (A) Compendium of gene expression signatures in 295 breast tumors. Shown are correlation values to canonical centroids of classes defined by intrinsic genes (basal, luminal A, luminal B, ErbB2, vs. normal-like), by the 70 genes (poor prognosis vs. good), and by the wound signature (activated vs. quiescent). Orange indicates positive correlation; blue indicates anticorrelation. Each row is a class; each column is a sample. (Lower) Corresponding clinical outcomes; black vertical bar indicated death or metastasis as the first recurrence event. (B) Summary of decision tree analysis. At each node, the dominant risk factor in multivariate analysis is used to segregate patients, and the process is repeated in each subgroup until patients or risk factors became exhausted. We found that the 70-gene signature was able to identify a group of patients with very good prognosis (group 0), and then the wound signature could divide the patients called “poor” by the 70-gene signature into those with moderate and significantly worse outcomes (groups 1 and 2). (C) Distribution of 144 lymph node-positive patients among the three groups defined in B. Because the 70-gene signature was identified by using a select subset of 60 patients with lymph node-negative disease, the decision tree incorporating the 70-gene signature was performed on the independent lymph node-positive subset to have an unbiased evaluation of risk prediction. Hazard ratios of metastasis risk after adjusting for all other factors listed in Table 1 are shown for the three subgroups stratified by the decision tree. (D) Distant metastasis free probabilities of patients stratified by the decision tree analysis. A total of 55, 32, and 57 patients are in group 0, 1, and 2, respectively, and 10 years DMFP for the three groups were 89%, 78%, and 47%, respectively (P = 6.94 × 10-6).

As an alternative approach to considering information from multiple gene expression signatures for clinical risk stratification, we developed and tested a decision tree algorithm. At each node in the decision tree, we considered all clinical risk factors and gene expression profiles, selected the parameter and threshold that best segregated the patients with divergent outcomes, and reiterated the process on each resulting subgroup until the patients or risk factors were exhausted. We discovered that, in decision trees incorporating gene expression signatures, the 70-gene and wound-response signatures were sufficient to capture most of the prognostic information in only two steps (Fig. 3 B_–_D). Modeling of nonlinear interactions between the gene expression signatures and clinical risk factors independently yielded a similar conclusion (Fig. 4). For patients with early breast cancer and lymph node involvement, the key clinical decision is whether and how to treat with adjuvant chemotherapy. As reported (9), patients with the favorable 70-gene profile had ≈90% metastasis-free probability (group 0). Patients whose cancers had a poor-prognosis 70-gene profile, but lacked the activated wound-response signature, have a risk profile similar to the aggregated average baseline (group 1); patients whose cancers had both the activated wound-response signature and the 70-gene poor prognosis signature had a risk of metastatic disease ≈6.4-fold higher than did patients in group 0 (10-year DMFP of 89%, 78%, and 47%, respectively). Thus, the patients in group 0 might reasonably consider foregoing adjuvant chemotherapy, whereas the patients in group 2 have a risk profile more similar to patients with locally advanced disease and might be recommended for dose-dense or taxane-based adjuvant chemotherapy (19, 20). Together, these results illustrate that adding the wound-response signature to existing clinical, pathologic, and gene expression prognostic factors can significantly improve risk stratification and clinical decision making.

Discussion

We used an independent data set to confirm that a wound-response gene expression signature is a powerful predictor of clinical outcome in patients with early stage breast cancers. Together with our previous results on locally advanced breast, lung, and gastric cancer, these findings reinforce the concept that a gene expression program related to the physiological response to a wound is frequently activated in common human epithelial tumors, and confers increased risk of metastasis and cancer progression. In the future, methods that simplify the evaluation of this molecular signature will be required for allowing routine use of the wound-response signature in clinical decision making. Prospective studies are needed to determine whether treatment decisions based on the wound-response signature might benefit patients.

The molecular mechanisms that activate, sustain, and eventually shut off the wound-response-like gene expression program in tumors should be investigated. By delineating the risk for metastasis based on the wound-response signature, it is possible that these high-risk breast cancer patients might some day benefit from therapies that target the wound response.

We have examined approaches to parameterize the wound-response signature so that it can be evaluated in tumors individually to yield a quantitative score; the interpretation of the wound signature score can then be rationally directed to suit the clinical task. As a first step toward integrating diverse prognostic signatures, we examined the interactions and information provided by three independent methods for using global gene expression patterns to classify breast cancers and predict their course: one that defined five molecular subtypes, one that was discovered by directly fitting to survival data, and one based on an in vitro model of a wound response. The different signatures each classified tumors into coherent and internally consistent groups, and where the signatures diverged, the combined information gave improved risk stratification compared to individual signatures. These results show that diverse analytic strategies are continuing to identify distinct molecular features that are related to poor prognosis in these tumors. Visualizing the connections between the different gene expression signatures suggests potential explanations for disparate clinical outcomes and sets the stage for directed experimentation. For example, the high level activation of the wound signatures in the basal-like subtype of breast cancers raises the possibility that basal epithelial cells in breast ducts have distinct roles in wound healing and may differentially regulate the CSR genes.

Direct approaches to building prognostic models from global gene expression data, by simply fitting the models to clinical outcome features are restricted to a palette of relatively simple models to avoid overfitting. An optimal model, reflecting the underlying pathogenic mechanisms, may be poorly represented of any of the models evaluated in these top-down supervised approaches and thus never discovered. Our results illustrate the potential advantages of a “bottom-up” approach building from gene expression signatures, developed to represent specific hypothesis about underlying pathogenic mechanism. Such an approach has the potential to outperform the top-down model-independent approaches in improving cancer stratification and clinical decision making. Moreover, this model-dependent bottom-up approach has the advantage of providing specific testable ideas about pathogenic mechanism and thereby potential targets for treatment.

Supplementary Material

Acknowledgments

This work was supported by National Institutes of Health Grants CA77097 and CA85129 (to P.O.B.) and AR050007 (to H.Y.C.), a National Science Foundation Predoctoral Fellowship (to J.B.S.), Dutch Cancer Society Grant NKB 2002-2575 (to M.v.d.V., D.N., L.v.t.V., and H.B.), and the Howard Hughes Medical Institute. P.O.B. is an Investigator of the Howard Hughes Medical Institute.

Notes

Author contributions: H.Y.C., D.S.A.N., M.v.d.R., and P.O.B. designed research; H.Y.C., D.S.A.N., and T.H. performed research; H.Y.C., D.S.A.N., T.H., J.B.S., H.D., Y.H., and L.J.v.t.V. contributed new reagents/analytic tools; H.Y.C., D.S.A.N., T.H., R.T., J.B.S., T.S., H.B., M.J.v.d.V., and P.O.B. analyzed data; and H.Y.C., D.S.A.N., H.B., M.J.v.d.V., and P.O.B. wrote the paper.

Abbreviations: CSR, core serum response; DMFP, distant metastasis-free probability; NIH, National Institutes of Health.

See Commentary on page 3531.

References

1. Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Boldrick, J. C., Sabet, H., Tran, T., Yu, X., et al. (2000) Nature 403**,** 503-511. [PubMed] [Google Scholar]

2. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., et al. (1999) Science 286**,** 531-537. [PubMed] [Google Scholar]

3. Perou, C. M., Sørlie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., Rees, C. A., Pollack, J. R., Ross, D. T., Johnsen, H., Akslen, L. A., et al. (2000) Nature 406**,** 747-752. [PubMed] [Google Scholar]

4. Sørlie, T., Perou, C. M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., Hastie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., et al. (2001) Proc. Natl. Acad. Sci. USA 98**,** 10869-10874. [PMC free article] [PubMed] [Google Scholar]

5. van't Veer, L. J., Dai, H., van de Vijver, M. J., He, Y. D., Hart, A. A., Mao, M., Peterse, H. L., van der Kooy, K., Marton, M. J., Witteveen, A. T., et al. (2002) Nature 415**,** 530-536. [PubMed] [Google Scholar]

6. Ramaswamy, S., Ross, K. N., Lander, E. S. & Golub, T. R. (2003) Nat. Genet. 33**,** 49-54. [PubMed] [Google Scholar]

7. Lapointe, J., Li, C., Higgins, J. P., van de Rijn, M., Bair, E., Montgomery, K., Ferrari, M., Egevad, L., Rayford, W., Bergerheim, U., et al. (2004) Proc. Natl. Acad. Sci. USA 101**,** 811-816. [PMC free article] [PubMed] [Google Scholar]

8. Huang, E., Cheng, S. H., Dressman, H., Pittman, J., Tsou, M. H., Horng, C. F., Bild, A., Iversen, E. S., Liao, M., Chen, C. M., et al. (2003) Lancet 361**,** 1590-1596. [PubMed] [Google Scholar]

9. van de Vijver, M. J., He, Y. D., van't Veer, L. J., Dai, H., Hart, A. A., Voskuil, D. W., Schreiber, G. J., Peterse, J. L., Roberts, C., Marton, M. J., et al. (2002) N. Engl. J. Med. 347**,** 1999-2009. [PubMed] [Google Scholar]

10. Chang, H. Y., Sneddon, J. B., Alizadeh, A. A., Sood, R., West, R. B., Montgomery, K., Chi, J. T., van de Rijn, M., Botstein, D. & Brown, P. O. (2004) PLoS Biol. 2**,** E7. [PMC free article] [PubMed] [Google Scholar]

11. Huang, E., Ishida, S., Pittman, J., Dressman, H., Bild, A., Kloos, M., D'Amico, M., Pestell, R. G., West, M. & Nevins, J. R. (2003) Nat. Genet. 34**,** 226-230. [PubMed] [Google Scholar]

12. Lamb, J., Ramaswamy, S., Ford, H. L., Contreras, B., Martinez, R. V., Kittrell, F. S., Zahnow, C. A., Patterson, N., Golub, T. R. & Ewen, M. E. (2003) Cell 114**,** 323-334. [PubMed] [Google Scholar]

13. Sørlie, T., Tibshirani, R., Parker, J., Hastie, T., Marron, J. S., Nobel, A., Deng, S., Johnsen, H., Pesich, R., Geisler, S., et al. (2003) Proc. Natl. Acad. Sci. USA 100**,** 8418-8423. [PMC free article] [PubMed] [Google Scholar]

14. Eisen, M. B., Spellman, P. T., Brown, P. O. & Botstein, D. (1998) Proc. Natl. Acad. Sci. USA 95**,** 14863-14868. [PMC free article] [PubMed] [Google Scholar]

15. Early Breast Cancer Trialists' Collaborative Group (1998) Lancet 352**,** 930-942. [PubMed] [Google Scholar]

16. Isaacs, C., Stearns, V. & Hayes, D. F. (2001) Semin. Oncol. 28**,** 53-67. [PubMed] [Google Scholar]

17. Eifel, P., Axelson, J. A., Costa, J., Crowley, J., Curran, W. J., Jr., Deshler, A., Fulton, S., Hendricks, C. B., Kemeny, M., Kornblith, A. B., et al. (2001) J. Natl. Cancer Inst. 93**,** 979-989. [PubMed] [Google Scholar]

18. Goldhirsch, A., Wood, W. C., Gelber, R. D., Coates, A. S., Thurlimann, B. & Senn, H. J. (2003) J. Clin. Oncol. 21**,** 3357-3365. [PubMed] [Google Scholar]

19. Citron, M. L., Berry, D. A., Cirrincione, C., Hudis, C., Winer, E. P., Gradishar, W. J., Davidson, N. E., Martino, S., Livingston, R., Ingle, J. N., et al. (2003) J. Clin. Oncol. 21**,** 1431-1439. [PubMed] [Google Scholar]

20. Nowak, A. K., Wilcken, N. R., Stockler, M. R., Hamilton, A. & Ghersi, D. (2004) Lancet Oncol. 5**,** 372-380. [PubMed] [Google Scholar]


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences