An empirical assessment of validation practices for molecular classifiers - PubMed (original) (raw)

An empirical assessment of validation practices for molecular classifiers

Peter J Castaldi et al. Brief Bioinform. 2011 May.

Abstract

Proposed molecular classifiers may be overfit to idiosyncrasies of noisy genomic and proteomic data. Cross-validation methods are often used to obtain estimates of classification accuracy, but both simulations and case studies suggest that, when inappropriate methods are used, bias may ensue. Bias can be bypassed and generalizability can be tested by external (independent) validation. We evaluated 35 studies that have reported on external validation of a molecular classifier. We extracted information on study design and methodological features, and compared the performance of molecular classifiers in internal cross-validation versus external validation for 28 studies where both had been performed. We demonstrate that the majority of studies pursued cross-validation practices that are likely to overestimate classifier performance. Most studies were markedly underpowered to detect a 20% decrease in sensitivity or specificity between internal cross-validation and external validation [median power was 36% (IQR, 21-61%) and 29% (IQR, 15-65%), respectively]. The median reported classification performance for sensitivity and specificity was 94% and 98%, respectively, in cross-validation and 88% and 81% for independent validation. The relative diagnostic odds ratio was 3.26 (95% CI 2.04-5.21) for cross-validation versus independent validation. Finally, we reviewed all studies (n = 758) which cited those in our study sample, and identified only one instance of additional subsequent independent validation of these classifiers. In conclusion, these results document that many cross-validation practices employed in the literature are potentially biased and genuine progress in this field will require adoption of routine external validation of molecular classifiers, preferably in much larger studies than in current practice.

PubMed Disclaimer

Figures

Figure 1:

Figure 1:

Random effects meta-analysis of relative diagnostic odds ratios (rDORs) for 28 studies with cross-validation classification estimates and independent validation estimates for the same classifier. The majority of studies show worse classification performance in independent validation than in cross-validation, with a summary rDOR estimate of 3.26 (95% CI 2.04–5.21) (where rDOR=1 represents equal classification performance in cross-validation and independent validation).

Figure 2:

Figure 2:

Random effects meta-analysis of relative diagnostic odd ratios (rDORs) for the 28 included studies, stratified by whether feature selection bias is likely to be present. The group with feature selection bias demonstrates significantly worse performance in independent validation than in cross-validation (rDOR 4.50, 95% CI 2.04–5.21), whereas this is not the case in the group that is unlikely to have this bias.

Figure 3:

Figure 3:

Random effects meta-analysis of relative diagnostic odd ratios (rDORs) for the 28 included studies, stratified by whether optimization bias (i.e. reporting the best result from cross-validations versus the average result across all cross-validation) is likely to be present. The group with optimization bias demonstrates significantly worse performance in independent validation than in cross-validation (rDOR 3.20, 95% CI 1.99–5.15). The group that is unlikely to have bias had a more extreme but non-significant point estimate.

Similar articles

Cited by

References

    1. Lander ES. Array of hope. Nat Genet. 1999;21:3–4. - PubMed
    1. Ioannidis JPA. Expectations, validity, and reality in omics. J Clin Epidemiol. 2010;63:950–9. - PubMed
    1. van’t Veer LJ, Dai H, van de Vijver MJ, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–6. - PubMed
    1. Ioannidis JP, Allison DB, Ball CA, et al. Repeatability of published microarray gene expression analyses. Nat Genet. 2009;41:149–55. - PubMed
    1. Simon R, Radmacher MD, Dobbin K, et al. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst. 2003;95:14–8. - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources