Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models - PubMed (original) (raw)
Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models
Nicholas J Horton et al. Am Stat. 2007 Feb.
Abstract
Missing data are a recurring problem that can cause bias or lead to inefficient analyses. Development of statistical methods to address missingness have been actively pursued in recent years, including imputation, likelihood and weighting approaches. Each approach is more complicated when there are many patterns of missing values, or when both categorical and continuous random variables are involved. Implementations of routines to incorporate observations with incomplete variables in regression models are now widely available. We review these routines in the context of a motivating example from a large health services research dataset. While there are still limitations to the current implementations, and additional efforts are required of the analyst, it is feasible to incorporate partially observed values, and these methods should be utilized in practice.
Figures
Figure 1
Monotone and non-monotone patterns of missingness (Obs=observed, M=missing)
Figure 2
Use of Likelihood based approach with EM algorithm to incorporate partially
Figure 3
Proposed guidelines for reporting missing covariate data (Burton and Altman 2004)
Figure 4
Description of missing data (using Stata misschk function)
Similar articles
- A nonparametric multiple imputation approach for missing categorical data.
Zhou M, He Y, Yu M, Hsu CH. Zhou M, et al. BMC Med Res Methodol. 2017 Jun 6;17(1):87. doi: 10.1186/s12874-017-0360-2. BMC Med Res Methodol. 2017. PMID: 28587662 Free PMC article. - Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study.
Marshall A, Altman DG, Royston P, Holder RL. Marshall A, et al. BMC Med Res Methodol. 2010 Jan 19;10:7. doi: 10.1186/1471-2288-10-7. BMC Med Res Methodol. 2010. PMID: 20085642 Free PMC article. - Inverse Probability of Treatment Weighting and Confounder Missingness in Electronic Health Record-based Analyses: A Comparison of Approaches Using Plasmode Simulation.
Vader DT, Mamtani R, Li Y, Griffith SD, Calip GS, Hubbard RA. Vader DT, et al. Epidemiology. 2023 Jul 1;34(4):520-530. doi: 10.1097/EDE.0000000000001618. Epub 2023 Apr 26. Epidemiology. 2023. PMID: 37155612 Free PMC article. - Review and evaluation of imputation methods for multivariate longitudinal data with mixed-type incomplete variables.
Cao Y, Allore H, Vander Wyk B, Gutman R. Cao Y, et al. Stat Med. 2022 Dec 30;41(30):5844-5876. doi: 10.1002/sim.9592. Epub 2022 Oct 11. Stat Med. 2022. PMID: 36220138 Free PMC article. Review. - Common Methods for Handling Missing Data in Marginal Structural Models: What Works and Why.
Leyrat C, Carpenter JR, Bailly S, Williamson EJ. Leyrat C, et al. Am J Epidemiol. 2021 Apr 6;190(4):663-672. doi: 10.1093/aje/kwaa225. Am J Epidemiol. 2021. PMID: 33057574 Free PMC article. Review.
Cited by
- Multiple Data Imputation Methods Advance Risk Analysis and Treatability of Co-occurring Inorganic Chemicals in Groundwater.
Mahmood AU, Islam M, Gulyuk AV, Briese E, Velasco CA, Malu M, Sharma N, Spanias A, Yingling YG, Westerhoff P. Mahmood AU, et al. Environ Sci Technol. 2024 Nov 19;58(46):20513-20524. doi: 10.1021/acs.est.4c05203. Epub 2024 Nov 7. Environ Sci Technol. 2024. PMID: 39509340 Free PMC article. - Imputation methods for mixed datasets in bioarchaeology.
Ryan-Despraz J, Wissler A. Ryan-Despraz J, et al. Archaeol Anthropol Sci. 2024;16(11):187. doi: 10.1007/s12520-024-02078-2. Epub 2024 Oct 23. Archaeol Anthropol Sci. 2024. PMID: 39450370 Free PMC article. - Improving Regression Analysis with Imputation in a Longitudinal Study of Alzheimer's Disease.
Chandrasekaran G, Xie SX; Alzheimer’s Disease Neuroimaging Initiative. Chandrasekaran G, et al. J Alzheimers Dis. 2024;99(1):263-277. doi: 10.3233/JAD-231047. J Alzheimers Dis. 2024. PMID: 38640151 Free PMC article. - [Posttraumatic stress disorder in children and adolescents: results of a cross-sectional study on the effects of the newly formulated PTSD and CPTSD diagnoses in the ICD-11].
Eilers R, Ertl V, Kasparik B, Kost A, Rosner R. Eilers R, et al. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2024 Apr;67(4):409-418. doi: 10.1007/s00103-024-03860-2. Epub 2024 Mar 18. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2024. PMID: 38498186 Free PMC article. German. - Adjusted Indirect Treatment Comparison of Progression-Free Survival with D-Rd and VRd Based on MAIA and SWOG S0777 Individual Patient-Level Data.
Durie BGM, Kumar SK, Ammann EM, Fu AZ, Kaila S, Lam A, Usmani SZ, Facon T. Durie BGM, et al. Adv Ther. 2024 May;41(5):1923-1937. doi: 10.1007/s12325-024-02807-y. Epub 2024 Mar 18. Adv Ther. 2024. PMID: 38494542 Free PMC article.
References
- Allison PD. Multiple imputation for missing data: a cautionary tale. Sociological Methods and Research. 2000;28:301–309.
- Allison PD. Missing data. SAGE University Papers; 2002.
- Allison PD. Imputation of categorical variables with PROC MI. 2005. [accessed July 30, 2006]. http://www2.sas.com/proceedings/sugi30/113-30.pdf.
- Barnard J, Meng XL. Applications of multiple imputation in medical studies: from AIDS to NHANES. Statistical Methods in Medical Research. 1999;8:17–36. - PubMed
- Bernaards CA, Belin TR, Schafer JL. Robustness of a multivariate normal approximation for imputation of incomplete binary data. Statistics in Medicine (In press) - PubMed
LinkOut - more resources
Full Text Sources
Other Literature Sources