Implementing an online tool for genome-wide validation of survival-associated biomarkers in ovarian-cancer using microarray data from 1287 patients (original) (raw)

The validation of prognostic biomarkers in large independent patient cohorts is a major bottleneck in ovarian cancer research. We implemented an online tool to assess the prognostic value of the expression levels of all microarray-quantified genes in ovarian cancer patients. First, a database was set up using gene expression data and survival information of 1287 ovarian cancer patients downloaded from Gene Expression Omnibus and The Cancer Genome Atlas (Affymetrix HG-U133A, HG-U133A 2.0, and HG-U133 Plus 2.0 microarrays). After quality control and normalization, only probes present on all three Affymetrix platforms were retained (_n_=22 277). To analyze the prognostic value of the selected gene, we divided the patients into two groups according to various quantile expressions of the gene. These groups were then compared using progression-free survival (_n_=1090) or overall survival (_n_=1287). A Kaplan–Meier survival plot was generated and significance was computed. The tool can be accessed online at www.kmplot.com/ovar. We used this integrative data analysis tool to validate the prognostic power of 37 biomarkers identified in the literature. Of these, CA125 (MUC16; _P_=3.7×10−5, hazard ratio (HR)=1.4), CDKN1B (_P_=5.4×10−5, HR=1.4), KLK6 (_P_=0.002, HR=0.79), IFNG (_P_=0.004, HR=0.81), P16 (_P_=0.02, HR=0.66), and BIRC5 (_P_=0.00017, HR=0.75) were associated with survival. The combination of several probe sets can further increase prediction efficiency. In summary, we developed a global online biomarker validation platform that mines all available microarray data to assess the prognostic power of 22 277 genes in 1287 ovarian cancer patients. We specifically used this tool to evaluate the effect of 37 previously published biomarkers on ovarian cancer prognosis.

Abstract

The validation of prognostic biomarkers in large independent patient cohorts is a major bottleneck in ovarian cancer research. We implemented an online tool to assess the prognostic value of the expression levels of all microarray-quantified genes in ovarian cancer patients. First, a database was set up using gene expression data and survival information of 1287 ovarian cancer patients downloaded from Gene Expression Omnibus and The Cancer Genome Atlas (Affymetrix HG-U133A, HG-U133A 2.0, and HG-U133 Plus 2.0 microarrays). After quality control and normalization, only probes present on all three Affymetrix platforms were retained (_n_=22 277). To analyze the prognostic value of the selected gene, we divided the patients into two groups according to various quantile expressions of the gene. These groups were then compared using progression-free survival (_n_=1090) or overall survival (_n_=1287). A Kaplan–Meier survival plot was generated and significance was computed. The tool can be accessed online at www.kmplot.com/ovar. We used this integrative data analysis tool to validate the prognostic power of 37 biomarkers identified in the literature. Of these, CA125 (MUC16; _P_=3.7×10−5, hazard ratio (HR)=1.4), CDKN1B (_P_=5.4×10−5, HR=1.4), KLK6 (_P_=0.002, HR=0.79), IFNG (_P_=0.004, HR=0.81), P16 (_P_=0.02, HR=0.66), and BIRC5 (_P_=0.00017, HR=0.75) were associated with survival. The combination of several probe sets can further increase prediction efficiency. In summary, we developed a global online biomarker validation platform that mines all available microarray data to assess the prognostic power of 22 277 genes in 1287 ovarian cancer patients. We specifically used this tool to evaluate the effect of 37 previously published biomarkers on ovarian cancer prognosis.

Introduction

With a mortality of 8.4 per 100 000 women, ovarian cancer is the most common cause of death among gynecological malignancies (http://seer.cancer.gov) with a 5-year survival rate of 10–30%. Relative to breast cancer, the molecular characteristics of epithelial ovarian cancer (EOC) are more heterogeneous. Despite extensive research, clinical–pathological factors including tumor stage, residual disease after surgery, histological type, and tumor grade are still the most important features related to patient outcome. To date, only two biomarkers have been approved by the Food and Drug Administration (FDA) for monitoring patients with EOC: CA125 (MUC16; Gadducci et al. 1995, 2004, Cooper et al. 2002, Riedinger et al. 2006) and HE4 (WFDC2; Huhtinen et al. 2009, Moore et al. 2009, 2010).

Several additional genes have been suggested as potential biomarkers for the progression of EOC. Low expression of p21 (Ferrandina et al. 2000, Plisiecka-Halasa et al. 2003, Bali et al. 2004), bax (Tai et al. 1998, Skirnisdottir et al. 2001), and hTERT (Brustmann 2005) and high expression of survivin (Sui et al. 2002), VEGFR (Hefler et al. 2006), p53 (Buttitta et al. 1997, Reles et al. 2001), human kallikrein 6 (Diamandis et al. 2003), human kallikrein 10 (Luo et al. 2001), Interleukin 6 (Scambia et al. 1995), p27 (Newcomb et al. 1999, Masciullo et al. 2000, Korkolopoulou et al. 2002, Schmider-Ross et al. 2006), cyclin D1 (Bali et al. 2004, Barbieri et al. 2004), cyclin D3 (Levidou et al. 2007), cyclin E (Sui et al. 2001, Farley et al. 2003, Rosen et al. 2006, Bedrosian et al. 2007), Bcl-xL (Materna et al. 2007), cIAP (Psyrri et al. 2006), and ERBB1 (Skirnisdottir et al. 2004, Psyrri et al. 2005) could represent prognostic variables for poor clinical outcome. In addition, the genome-wide investigation of adequate clinical cohorts delivers unprecedented amount of potential new biomarkers (Denkert et al. 2009).

However, most of these potential biomarkers have neither been validated in multivariate analyses nor was their discriminative power validated in large clinical cohorts. Even more alarmingly, many reports have questioned or rejected a correlation between a proposed biomarker and clinical outcome. Doubts were raised regarding the markers CA125 (Cruickshank et al. 1987, van der Burg et al. 1988, Rustin et al. 1989, Sevelda et al. 1989), cyclin D1 (Masciullo et al. 1997, Dhar et al. 1999), p16 (Milde-Langosch et al. 2003, Khouja et al. 2007), p21 (Baekelandt et al. 1999, Levesque et al. 2000, Schuyer et al. 2001), p27 (Schmider et al. 2000), p53 (Smith-Sorensen et al. 1998, Wang et al. 2004, Green et al. 2006), Bcl-xl (Baekelandt et al. 2000), cIAP (Kleinberg et al. 2007), survivin (Cohen et al. 2003, Ferrandina et al. 2005), hTERT (Wisman et al. 2003, Widschwendter et al. 2004), ERBB1 (Berchuck et al. 1991, Meden et al. 1995, Nielsen et al. 2004), and ERBB2 (Rubin et al. 1993, Meden et al. 1995, Ross et al. 1999, Nielsen et al. 2004, Riener et al. 2004).

Given the large number of potential biomarkers for EOC, the immediate challenge is to validate the most robust candidates eligible for further investigation. Recent advances in genomic technologies together with powerful bioinformatic tools can enable us to deliver this prerequisite. We recently developed an online biomarker validation tool using microarray data of 2000 breast cancer patients (Gyorffy et al. 2010). In this, the expression of a selected gene can be used to split patients into groups, and the proportional survival of these groups is compared to each other.

In this study, our aim was to implement an online survival analysis tool for the rapid assessment of prognosis-related genes in ovarian cancer and to test the validity of previously proposed biomarkers. Furthermore, we also developed additional analysis options including the computation of multigenic prognosis predictors and the option of grouping patients based on applied treatment protocols.

Materials and methods

Collection of ovarian cancer microarray data sets

We searched Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) and The Cancer Genome Atlas (TCGA; http://cancergenome.nih.gov) to identify data sets suitable for the analysis. In this, the keywords ‘ovarian’, ‘cancer’, ‘survival’, ‘gpl96’, ‘gpl570’, and ‘gpl571’ were used. Only publications with available raw microarray gene expression data, clinical survival information, and at least 20 patients were included. Only three microarray platforms, GPL96 (Affymetrix HG-U133A), GPL570 (Affymetrix HG-U133 Plus 2.0), and GPL571/GPL3921 (Affymetrix HG-U133A 2.0), were considered because they are frequently used and because these particular arrays have 22 277 probe sets (representing 13 435 unique genes) in common. The use of almost identical platforms and identical probe sets is vital because different platforms for gene expression profiling measure expression of the same gene with varying accuracy, on different relative scales, and with diverse dynamic ranges (Tan et al. 2003). Finally, we controlled all samples using the ranked expression of all genes to identify repeatedly published microarrays.

Setup of server for online survival calculation

The raw.CEL files were MAS5 normalized in the R statistical environment (www.r-project.org) using the affy Bioconductor library (Gautier et al. 2004). MAS5 can be applied to individual chips, making future extensions of the database uncomplicated. Furthermore, MAS5 ranked among the best normalization methods when compared with the results of RT-PCR measurements in our recent study (Gyorffy et al. 2009). For the analysis, only probes measured on GPL96, GPL570, and GPL571/GPL3921 were retained (_n_=22 277). At this stage, we performed a second scaling normalization to set the average expression on each chip to 1000. Although this technique cannot remove all, but it can significantly reduce batch effects (Sims et al. 2008). We integrated the gene expression and clinical data using PostgreSQL, an open-source object-relational database system (www.postgresql.org). Data security is ensured through PostgreSQL permissions that are imposed on individual tables in the project databases.

The KMplot web application can be reached in a platform-independent user interface. The interactivity of the service is increased by the usage of JavaScript and Ajax technologies. The server is hosted on Debian Linux (www.debian.org) and is powered by Apache (www.apache.org). he server-side scripts were developed in hypertext preprocessor (PHP), which controls the analysis requests and delivers the results. Open Database Connectivity is used as a middleware layer between the R and the PostgreSQL database via the RODBC package (cran.r-project.org/package=RODBC). The package ‘survival’ is used to calculate and plot Kaplan–Meier survival curves, and the number-at-risk is indicated below the main plot. Hazard ratio (HR; and 95% confidence intervals) and logrank P are calculated and displayed. The central server for the Kaplan–Meier plotter for ovarian cancer can be reached at www.kmplot.com/ovar.

Probe set options

We also implemented a set of probe-set-related options, including the option to use all probe sets available for a given gene on the microarray simultaneously and to use a combined expression of several probe sets. Using this option, it is possible to assess the effect of the mean expression of gene combinations on survival.

In addition, bee swarm plot can be drawn using the beeswarm package (www.cbs.dtu.dk/∼eklund/beeswarm/). The bee swarm plot is capable of visualizing gene expression as nonoverlapping points in a one-dimensional scatter plot. A bee swarm plot can be used to quickly identify outlier samples and genes with bimodal distribution.

Validation of previously published EOC biomarkers

A PubMed search was performed using the keywords ‘ovarian cancer’, ‘survival’, ‘biomarker’, and ‘gene expression’ to identify genes described in the literature as potential EOC biomarkers. Then, using PubMed gene, we added a unique gene symbol for each of the genes and identified the corresponding Affymetrix probe set IDs. The capability of these genes to predict survival was measured by using the probe set IDs in the online analysis tool.

In the combination of several markers, their mean expression is first computed for each sample. Then, the median of these is used for splitting the patients into cohorts during the analysis.

Results

Construction of combined ovarian cancer microarray database

We identified 1287 unique patients in eight data sets meeting our criteria in GEO and TCGA. In the GSE3149 data set, we found two samples repeatedly published (GSM70546=GSM70547 and GSM70511=GSM70512). As for these samples the unique recognition of the appropriate clinical information was not possible, we removed them from the final database. Of the above, 72% have serous and 2% have endometrioid tumors. Patients are distributed across stage 1 (_n_=34, 3.6%), stage 2 (_n_=122, 13%), stage 3 (_n_=672, 71.7%), and stage 4 (_n_=109, 11.6%). Debulking was optimal (residual tumor <1 cm) in 674 out of 1119 patients. The median overall survival is 31.0 months, 1090 patients have progression-free survival data, and 1287 have overall survival data. (note: some publications report ‘disease-free survival’ (Konstantinopoulos et al. 2010) or ‘relapse-free survival’ (Tothill et al. 2008) instead of ‘progression-free survival’. These were merged as ‘progression-free survival’ to enable a meta-analysis of the complete database.). A summary of the clinical characteristics of the patients in each data set used in the analysis is shown in Table 1.

Table 1

Clinical properties of the ovarian cancer patients used in the analysis

GEO ID Reference GEO platform No. of samples in data set Death event Median overall survival Serous/endometrioid Grade (1/2/3) Stage (1/2/3/4) Debulk optimal (/out of) Treatment contains platin (/out of) Treatment contains Taxol (/out of)
GSE14764 Denkert et al. 2009 GPL96 80 21 35.2 68/7 NA NA 27/29 78/79 79/79
GSE15622 Ahmed et al. 2007 GPL571 35 28 27.0 31/0 0/7/28 0/0/26/9 NA 20/35 15/35
GSE19829 Konstantinopoulos et al. 2010 GPL570 28 17 35.0 NA NA NA NA NA NA
GSE3149 Bild et al. 2006 GPL96 116 69 34.0 NA 4/55/54 0/1/96/14 64/117 115/115 94/115
GSE9891 Tothill et al. 2008 GPL570 285 110 28.0 264/20 24/18/217 19/97/164/0 160/229 242/282 195/282
GSE18520 Mok et al. 2009 GPL570 53 41 22.0 53/0 0/0/53 NA NA NA NA
GSE26712 NA GPL96 185 129 38.7 NA NA NA 90/185 NA NA
TCGA TCGA 2011 GPL3921 505 277 30.6 505/0 5/62/427 15/24/386/81 333/452 458/473 233/468
Total 1287 692 31.0 925/27 33/142/779 34/122/672/109 674/1121 914/985 616/980

NA, data not available; /out of, total number of patients with available clinical data.

Setup of online survival analysis platform

The Kaplan–Meier plot shows the association between the investigated marker and survival in which the samples are grouped according to the median (or upper or lower quartile) expression of the selected gene. Before running the analysis, the patients can be filtered using stage, histology, grade, and treatment parameters including debulking status and applied chemotherapy. In addition, as an alternative to progression-free survival, overall survival can also be investigated.

Since there is an already established biomarker (CA125), a clinician might be interested in a specific clinical cohort of patients having low CA125 levels. Therefore, we added an additional filtering option in which only patients having an average CA125 expression (average of the two reliable probe sets) below the lower quartile of all patients are included. We must note that while this study measured tissue levels of CA125, the FDA-approved test for ovarian cancer is serum based.

Validation of previously published EOC biomarkers

Markers of ovarian cancer prognosis have been identified using literature search. We computed Kaplan–Meier plots for 37 proposed biomarkers to assess their effect on prognosis (for the complete results see Table 2 and Fig. 1). All biomarkers were investigated in the same cohort in which they were discovered. High significance was achieved for CA125, KLK6, IFNG, P15, P16, CDKN1B, and BIRC5. In addition, we have also run the analysis for predicting progression-free survival in all patients.

Table 2

The association between prognostic markers and progression-free survival. The patients were divided into two groups as having higher or lower expression as compared to the median. The markers were analyzed in subsets of patients with equivalent clinical characteristics to the cohorts in which the association has previously been described

Symbol Gene Reference Survival Analyzed in the cohort of Affymetrix ID Q HR P
CA(MUC 16)125 CA 125 Gadducci et al. 1995, Cooper et al. 2002, Gadducci et al. 2004, Riedinger et al. 2006 PFS All patients 220196_at 2 NS NS
201384_s_at 1 1.3 0.0003*
201383_s_at 1 1.4 3.7×10−5*,a
KRT19 Cytokeratin 19 Tempfer et al. 1998, Gadducci et al. 2001 PFS Debulk=subopt. 201650_at 1 NS NS
KLK6 Kallikrein 6 Diamandis et al. 2003 PFS All patients 216699_s_at 2 0.79 0.002*
204733_at 1 NS NS
KLK10 Kallikrein 10 Luo et al. 2001 PFS Stage=3+4 209792_s_at 1 NS NS
215808_at 3 NS NS
IL6 Interleukin 6 Scambia et al. 1995 OS All patients 205207_at 2 NS NS
IL7 Interleukin 7 Lambeck et al. 2007 OS All patients 206693_at 3 NS NS
IFNG γ-Interferon Marth et al. 2004 PFS All patients 210354_at 3 0.81 0.004*
FAS sFas Hefler et al. 2000, Konno et al. 2000 PFS All patients 204780_s_at 1 1.2 0.017
204781_s_at 1 NS NS
212218_s_at 1 0.84 0.024
215719_x_at 2 NS NS
216252_x_at 2 NS NS
217006_x_at 3 NS NS
VEGFR VEGFR Hefler et al. 2006 OS All patients 203934_at 2 1.2 0.064
CCND1 Cyclin D1 Bali et al. 2004, Barbieri et al. 2004 OS Stage=3+4 208711_s_at 1 NS NS
208712_at 1 NS NS
CCND3 Cyclin D3 Levidou et al. 2007 OS All patients 201700_at 1 NS NS
CCNE Cyclin E Sui et al. 2001, Farley et al. 2003, Rosen et al. 2006, Bedrosian et al. 2007 OS Debulk=subopt. 213523_at 2 NS NS
205034_at 2 NS NS
211814_s_at 3 NS NS
P(CDK N2B)15 p15 Kudoh et al. 2002 PFS All patients 204599_s_at 1 NS NS
212857_x_at 1 1.3 0.0005*
214512_s_at 1 1.2 0.01
221727_at 3 NS NS
218708_at 1 NS NS
P(CDK N2A)16 p16 Katsaros et al. 2004, Kommoss et al. 2007 PFS Debulk=subopt. 207039_at 2 0.66 0.002*
209644_x_at 1 NS NS
211156_at 3 0.69 0.009
CDKN1A p21 Ferrandina et al. 2000, Plisiecka-Halasa et al. 2003, Bali et al. 2004 PFS Histology=serous 202284_s_at 1 NS NS
CDKN1B p27 Newcomb et al. 1999, Masciullo et al. 2000, Korkolopoulou et al. 2002, Schmider-Ross et al. 2006 PFS All patients 209112_at 1 1.4 5.4×10−5*,a
RB1 pRB Dong et al. 1997, Konstantinidou et al. 2003 OS Stage=1 203132_at 1 NS NS
211540_s_at 3 NS NS
E2F1 E2F1 Suh et al. 2008 PFS All patients 2028_s_at 1 0.83 0.017
3 NS NS
E2F2 E2F2 Reimer et al. 2007 PFS All patients 207042_at 3 0.86 0.037
E2F4 E2F4 Reimer et al. 2007 PFS All patients 202248_at 3 0.85 0.034
38707_r_at 1 NS NS
TP53 p53 Buttitta et al. 1997, Reles et al. 2001 PFS Stage=3+4 211300_s_at 2 NS NS
201746_at 1 0.84 0.075
TP73 p73 Becker et al. 2006 OS All patients 220804_s_at 3 NS NS
BAX bax Tai et al. 1998, Skirnisdottir et al. 2001 PFS Therapy=contains Taxol 208478_s_at 2 NS NS
211833_s_at 2 NS NS
BCL2L1 Bcl-xl Materna et al. 2007 PFS All patients 212312_at 1 0.86 0.04
215037_s_at 2 NS NS
206665_s_at 3 NS NS
BIRC2 cIAP Psyrri et al. 2006 OS Stage=3+4 202076_at 1 NS NS
BIRC5 Survivin Sui et al. 2002 PFS All patients 210334_x_at 2 0.75 0.00017*
202094_at 2 0.84 0.018
202095_s_at 1 0.84 0.018
TERT hTERT Brustmann 2005 OS Histology=serous 207199_at 3 NS NS
EGFR ERBB1 Skirnisdottir et al. 2004, Psyrri et al. 2005 PFS Stage=1+2 201983_s_at 1 NS NS
201984_s_at 1 NS NS
211551_at 2 NS NS
210984_x_at 3 NS NS
211550_at 3 NS NS
211607_x_at 3 NS NS
ERBB2 ERBB2 Lassus et al. 2004 PFS Histology=serous 210930_s_at 3 NS NS
216836_s_at 1 NS NS
MET c-Met Sawada et al. 2007 OS Stage=3+4 217828_at 1 NS NS
203510_at 1 NS NS
211599_x_at 1 NS NS
213807_x_at 2 NS NS
213816_s_at 3 NS NS
MMP2 MMP-2 Torng et al. 2004 PFS Histology=endom. 201069_at 1 0.33 0.05
MMP9 MMP-9 Sillanpaa et al. 2007 OS Stage=1 203936_s_at 1 NS NS
MMP14 MT1-MMP Kamat et al. 2006 OS Stage=2+3+4 160020_at 1 NS NS
202828_s_at 1 NS NS
202827_s_at 2 NS NS
217279_x_at 3 NS NS
WFDC2 (HE4) Epididymis protein 4 Huhtinen et al. 2009, Moore et al. 2009, 2010 PFS All patients 203892_at 1 NS NS
SERPINB5 Maspin Secord et al. 2006 PFS Debulk=subopt. 204855_at 1 NS NS
BRCA1 BRCA1 Thrall et al. 2006 OS All patients 211851_x_at 3 0.82 0.01
204531_s_at 2 NS NS
ERCC1 ERCC1 Darcy & Tian 2007 PFS Stage=3 203719_at 1 NS NS
Therapy=Tax+Plat 203720_s_at 1 NS NS

Figure 1

Figure 1

Figure 1

Survival plots depicting the good prognostic effect on progression-free survival of the lower expression of CA125 (A, 201383_s_at), CDKN1B (B, 209112_at), and P15 (C, 212857_x_at). Classification using the mean expression of two genes (CA125+CDKN1B) with a cutoff at the lower quartile results in increased discriminative power (D).

Citation: Endocrine-Related Cancer 19, 2; 10.1530/ERC-11-0329

In an effort to improve accuracy, a pair-wise combination of the three best performing probe sets was assessed independently. The combination of CA125 and CDKN1B with a cutoff at the lower quartile resulted in classification significance superior to the power of the markers independently (HR=1.5 and _P_=5.4×10−6 vs HR=1.4, _P_=3.7×10−5 and HR=1.4, _P_=5.4×10−5 for CA125 and CDKN1B, respectively, see Table 2 and Fig. 1).

Discussion

The validation of prognostic biomarkers is a major bottleneck in ovarian cancer research. Here, we combined multiple large microarray data sets to increase the statistical power for a meta-analysis of 22 277 genes. We developed a freely accessible online tool to estimate the prognostic value of any selected gene in a large cohort of clinical patients. After dividing the patients into two groups based on the expression of the selected gene, a Kaplan–Meier plot is generated. The implemented computations are performed in real time on our server. This enables seamless future extension using new data sets or new filtering options.

We have integrated data sets from GEO and TCGA – ∼40% of samples used by www.kmplot.com/ovar are from the TCGA repository. For the TCGA samples alone, there is an option to perform analyses in the caIntegrator website (https://caintegrator.nci.nih.gov). The samples in TCGA are open access or restricted (access is granted to NIH staff and to eRA Commons principal investigators) – however, the Affymetrix HG-U133 microarray profiles for the ovarian cancer patients are publicly available. We plan to continuously incorporate new GEO data sets as well as new TCGA samples in www.kmplot.com.

In contrast to breast cancer, where several already approved markers are in clinical use, in ovarian cancer only minimal progress has been made in recent years. When investigating the previously proposed biomarkers, we found that only few genes are actually capable of predicting outcome in our combined data set: CA125, P15, KLK6, IFNG, P16, CDKN1B, and BIRC5. Of these, CA125 and CDKN1B resulted in very robust significance. These results may reflect the high genetic heterogeneity of ovarian cancer (Gyorffy et al. 2008) and emphasize the importance of potential improvements in prognosis.

The most extensively studied marker for EOC is CA125 (Gadducci et al. 1995, 2004, Cooper et al. 2002, Riedinger et al. 2006), and determining its concentration in serum is essential for monitoring ovarian cancer progression. Fifty percent increase in serum CA125 level has been correlated to progression, and present progression definition of the Gynecological Cancer Intergroup defines progression based on two elevated serum CA125 levels. According to our results, tumor level of CA125 gene was able to predict later clinical outcome. Notably, we observed two different probe sets representing CA125 as significant. A third probe set did not show significant prognostic power, but it has also displayed low quality in terms of average expression as compared to the probe sets with significant prognostic power.

The role of the cell cycle control gene p27 (CDKN1B) as a prognostic marker in ovarian cancer was suggested in several studies (Newcomb et al. 1999, Masciullo et al. 2000, Korkolopoulou et al. 2002, Schmider-Ross et al. 2006). In addition, numerous recent analyses also confirmed its role in 205 (Lee et al. 2011), 131 (Skirnisdottir et al. 2011), and 339 (Duncan et al. 2010) patients. p27 is measured by only one probe set on the microarrays, and this probe set delivered high prognostic power in our analysis.

P15 is a tumor suppressor gene previously associated with ovarian cancer progression in 45 patients (Kudoh et al. 2002). The methylation status of P15 has also been investigated but was not an independent prognostic factor in 145 patients (Tam et al. 2007). One of the probe sets measuring P15 (212857_x_at) delivered a high prognostic potential.

To this point, we have investigated the prognostic power of individual probe sets. However, recent reports based on genomic technologies use not only single selected genes, but also a combination of these. In addition, some of the markers are not related to ovarian cancer prognosis in general, but have discriminative potential in one of the subgroups, or are related to different treatment regimens. While the evaluation of all potential markers and all eligible combinations is beyond the scope of this study, our online tool was set up exactly to enable researchers to perform these tests on our database.

In summary, we reviewed previously reported biomarkers of ovarian cancer prognosis and assessed their performance in a meta-analysis of 1297 ovarian cancer patients. We also developed an online biomarker validation platform to mine all available microarray data to assess the prognostic power of 22 277 genes.

Declaration of interest

The authors declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported.

Funding

B Győrffy study was supported by the OTKA PD 83154, by the TAMOP-4.2.1.B-09/1/KMR-2010-0001, by the ETT 029/2009 grant and by the Alexander von Humboldt Stiftung. Z Szállási was supported by the Breast Cancer Research Foundation.

References