Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis - PubMed (original) (raw)
Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis
E W Steyerberg et al. J Clin Epidemiol. 1999 Oct.
Abstract
Stepwise selection methods are widely applied to identify covariables for inclusion in regression models. One of the problems of stepwise selection is biased estimation of the regression coefficients. We illustrate this "selection bias" with logistic regression in the GUSTO-I trial (40,830 patients with an acute myocardial infarction). Random samples were drawn that included 3, 5, 10, 20, or 40 events per variable (EPV). Backward stepwise selection was applied in models containing 8 or 16 pre-specified predictors of 30-day mortality. We found a considerable overestimation of regression coefficients of selected covariables. The selection bias decreased with increasing EPV. For EPV 3, 10, or 40, the bias exceeded 25% for 7, 3, and 1 in the 8-predictor model respectively, when a conventional selection criterion was used (alpha = 0.05). For these EPV values, the bias was less than 20% for all covariables when no selection was applied. We conclude that stepwise selection may result in a substantial bias of estimated regression coefficients.
Similar articles
- Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets.
Steyerberg EW, Eijkemans MJ, Harrell FE Jr, Habbema JD. Steyerberg EW, et al. Stat Med. 2000 Apr 30;19(8):1059-79. doi: 10.1002/(sici)1097-0258(20000430)19:8<1059::aid-sim412>3.0.co;2-0. Stat Med. 2000. PMID: 10790680 - Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality.
Austin PC, Tu JV. Austin PC, et al. J Clin Epidemiol. 2004 Nov;57(11):1138-46. doi: 10.1016/j.jclinepi.2004.04.003. J Clin Epidemiol. 2004. PMID: 15567629 - A simulation study of sample size demonstrated the importance of the number of events per variable to develop prediction models in clustered data.
Wynants L, Bouwmeester W, Moons KG, Moerbeek M, Timmerman D, Van Huffel S, Van Calster B, Vergouwe Y. Wynants L, et al. J Clin Epidemiol. 2015 Dec;68(12):1406-14. doi: 10.1016/j.jclinepi.2015.02.002. Epub 2015 Feb 14. J Clin Epidemiol. 2015. PMID: 25817942 - Issues for covariance analysis of dichotomous and ordered categorical data from randomized clinical trials and non-parametric strategies for addressing them.
Koch GG, Tangen CM, Jung JW, Amara IA. Koch GG, et al. Stat Med. 1998 Aug 15-30;17(15-16):1863-92. doi: 10.1002/(sici)1097-0258(19980815/30)17:15/16<1863::aid-sim989>3.0.co;2-m. Stat Med. 1998. PMID: 9749453 Review. - [Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].
Amato L, Colais P, Davoli M, Ferroni E, Fusco D, Minozzi S, Moirano F, Sciattella P, Vecchi S, Ventura M, Perucci CA. Amato L, et al. Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100. Epidemiol Prev. 2013. PMID: 23851286 Review. Italian.
Cited by
- Tree-based identification of subgroups for time-varying covariate survival data.
Bertolet M, Brooks MM, Bittner V. Bertolet M, et al. Stat Methods Med Res. 2016 Feb;25(1):488-501. doi: 10.1177/0962280212460442. Epub 2012 Oct 14. Stat Methods Med Res. 2016. PMID: 23070595 Free PMC article. - Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges.
Goldstein BA, Navar AM, Carter RE. Goldstein BA, et al. Eur Heart J. 2017 Jun 14;38(23):1805-1814. doi: 10.1093/eurheartj/ehw302. Eur Heart J. 2017. PMID: 27436868 Free PMC article. Review. - An individualized decision between physical therapy or surgery for patients with degenerative meniscal tears cannot be based on continuous treatment selection markers: a marker-by-treatment analysis of the ESCAPE study.
Noorduyn JCA, van de Graaf VA, Willigenburg NW, Scholten-Peeters GGM, Mol BW, Heymans MW, Coppieters MW, Poolman RW; ESCAPE Research Group. Noorduyn JCA, et al. Knee Surg Sports Traumatol Arthrosc. 2022 Jun;30(6):1937-1948. doi: 10.1007/s00167-021-06851-x. Epub 2022 Feb 5. Knee Surg Sports Traumatol Arthrosc. 2022. PMID: 35122496 Free PMC article. Clinical Trial. - Modelling mobile-based technology adoption among people with dementia.
Chaurasia P, McClean S, Nugent CD, Cleland I, Zhang S, Donnelly MP, Scotney BW, Sanders C, Smith K, Norton MC, Tschanz J. Chaurasia P, et al. Pers Ubiquitous Comput. 2022;26(2):365-384. doi: 10.1007/s00779-021-01572-x. Epub 2021 May 3. Pers Ubiquitous Comput. 2022. PMID: 35368316 Free PMC article. - Predicting non return to work after orthopaedic trauma: the Wallis Occupational Rehabilitation RisK (WORRK) model.
Luthi F, Deriaz O, Vuistiner P, Burrus C, Hilfiker R. Luthi F, et al. PLoS One. 2014 Apr 9;9(4):e94268. doi: 10.1371/journal.pone.0094268. eCollection 2014. PLoS One. 2014. PMID: 24718689 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical