An empirical assessment of validation practices for molecular classifiers - PubMed (original) (raw)
An empirical assessment of validation practices for molecular classifiers
Peter J Castaldi et al. Brief Bioinform. 2011 May.
Abstract
Proposed molecular classifiers may be overfit to idiosyncrasies of noisy genomic and proteomic data. Cross-validation methods are often used to obtain estimates of classification accuracy, but both simulations and case studies suggest that, when inappropriate methods are used, bias may ensue. Bias can be bypassed and generalizability can be tested by external (independent) validation. We evaluated 35 studies that have reported on external validation of a molecular classifier. We extracted information on study design and methodological features, and compared the performance of molecular classifiers in internal cross-validation versus external validation for 28 studies where both had been performed. We demonstrate that the majority of studies pursued cross-validation practices that are likely to overestimate classifier performance. Most studies were markedly underpowered to detect a 20% decrease in sensitivity or specificity between internal cross-validation and external validation [median power was 36% (IQR, 21-61%) and 29% (IQR, 15-65%), respectively]. The median reported classification performance for sensitivity and specificity was 94% and 98%, respectively, in cross-validation and 88% and 81% for independent validation. The relative diagnostic odds ratio was 3.26 (95% CI 2.04-5.21) for cross-validation versus independent validation. Finally, we reviewed all studies (n = 758) which cited those in our study sample, and identified only one instance of additional subsequent independent validation of these classifiers. In conclusion, these results document that many cross-validation practices employed in the literature are potentially biased and genuine progress in this field will require adoption of routine external validation of molecular classifiers, preferably in much larger studies than in current practice.
Figures
Figure 1:
Random effects meta-analysis of relative diagnostic odds ratios (rDORs) for 28 studies with cross-validation classification estimates and independent validation estimates for the same classifier. The majority of studies show worse classification performance in independent validation than in cross-validation, with a summary rDOR estimate of 3.26 (95% CI 2.04–5.21) (where rDOR=1 represents equal classification performance in cross-validation and independent validation).
Figure 2:
Random effects meta-analysis of relative diagnostic odd ratios (rDORs) for the 28 included studies, stratified by whether feature selection bias is likely to be present. The group with feature selection bias demonstrates significantly worse performance in independent validation than in cross-validation (rDOR 4.50, 95% CI 2.04–5.21), whereas this is not the case in the group that is unlikely to have this bias.
Figure 3:
Random effects meta-analysis of relative diagnostic odd ratios (rDORs) for the 28 included studies, stratified by whether optimization bias (i.e. reporting the best result from cross-validations versus the average result across all cross-validation) is likely to be present. The group with optimization bias demonstrates significantly worse performance in independent validation than in cross-validation (rDOR 3.20, 95% CI 1.99–5.15). The group that is unlikely to have bias had a more extreme but non-significant point estimate.
Similar articles
- Comparison of feature selection and classification for MALDI-MS data.
Liu Q, Sung AH, Qiao M, Chen Z, Yang JY, Yang MQ, Huang X, Deng Y. Liu Q, et al. BMC Genomics. 2009 Jul 7;10 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2164-10-S1-S3. BMC Genomics. 2009. PMID: 19594880 Free PMC article. - Bias in error estimation when using cross-validation for model selection.
Varma S, Simon R. Varma S, et al. BMC Bioinformatics. 2006 Feb 23;7:91. doi: 10.1186/1471-2105-7-91. BMC Bioinformatics. 2006. PMID: 16504092 Free PMC article. - Reviewing ensemble classification methods in breast cancer.
Hosni M, Abnane I, Idri A, Carrillo de Gea JM, Fernández Alemán JL. Hosni M, et al. Comput Methods Programs Biomed. 2019 Aug;177:89-112. doi: 10.1016/j.cmpb.2019.05.019. Epub 2019 May 20. Comput Methods Programs Biomed. 2019. PMID: 31319964 Review. - Comparison of multivariate classifiers and response normalizations for pattern-information fMRI.
Misaki M, Kim Y, Bandettini PA, Kriegeskorte N. Misaki M, et al. Neuroimage. 2010 Oct 15;53(1):103-18. doi: 10.1016/j.neuroimage.2010.05.051. Epub 2010 May 23. Neuroimage. 2010. PMID: 20580933 Free PMC article. - Classification based upon gene expression data: bias and precision of error rates.
Wood IA, Visscher PM, Mengersen KL. Wood IA, et al. Bioinformatics. 2007 Jun 1;23(11):1363-70. doi: 10.1093/bioinformatics/btm117. Epub 2007 Mar 28. Bioinformatics. 2007. PMID: 17392326 Review.
Cited by
- Access to ground truth at unconstrained size makes simulated data as indispensable as experimental data for bioinformatics methods development and benchmarking.
Sandve GK, Greiff V. Sandve GK, et al. Bioinformatics. 2022 Oct 31;38(21):4994-4996. doi: 10.1093/bioinformatics/btac612. Bioinformatics. 2022. PMID: 36073940 Free PMC article. No abstract available. - Increasing value and reducing waste in research design, conduct, and analysis.
Ioannidis JP, Greenland S, Hlatky MA, Khoury MJ, Macleod MR, Moher D, Schulz KF, Tibshirani R. Ioannidis JP, et al. Lancet. 2014 Jan 11;383(9912):166-75. doi: 10.1016/S0140-6736(13)62227-8. Epub 2014 Jan 8. Lancet. 2014. PMID: 24411645 Free PMC article. - Why your new cancer biomarker may never work: recurrent patterns and remarkable diversity in biomarker failures.
Kern SE. Kern SE. Cancer Res. 2012 Dec 1;72(23):6097-101. doi: 10.1158/0008-5472.CAN-12-3232. Epub 2012 Nov 19. Cancer Res. 2012. PMID: 23172309 Free PMC article. Review. - Clinical outcome prediction by microRNAs in human cancer: a systematic review.
Nair VS, Maeda LS, Ioannidis JP. Nair VS, et al. J Natl Cancer Inst. 2012 Apr 4;104(7):528-40. doi: 10.1093/jnci/djs027. Epub 2012 Mar 6. J Natl Cancer Inst. 2012. PMID: 22395642 Free PMC article. Review. - Generalizing predictions to unseen sequencing profiles via deep generative models.
Oh M, Zhang L. Oh M, et al. Sci Rep. 2022 May 3;12(1):7151. doi: 10.1038/s41598-022-11363-w. Sci Rep. 2022. PMID: 35504956 Free PMC article.
References
- Lander ES. Array of hope. Nat Genet. 1999;21:3–4. - PubMed
- Ioannidis JPA. Expectations, validity, and reality in omics. J Clin Epidemiol. 2010;63:950–9. - PubMed
- van’t Veer LJ, Dai H, van de Vijver MJ, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–6. - PubMed
- Ioannidis JP, Allison DB, Ball CA, et al. Repeatability of published microarray gene expression analyses. Nat Genet. 2009;41:149–55. - PubMed
- Simon R, Radmacher MD, Dobbin K, et al. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst. 2003;95:14–8. - PubMed
Publication types
MeSH terms
Grants and funding
- K08 HL102265/HL/NHLBI NIH HHS/United States
- K08 HL102265-02/HL/NHLBI NIH HHS/United States
- UL1 RR025752/RR/NCRR NIH HHS/United States
- K08HL102265/HL/NHLBI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources