Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models - PubMed (original) (raw)
Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models
Nicholas J Horton et al. Am Stat. 2007 Feb.
Abstract
Missing data are a recurring problem that can cause bias or lead to inefficient analyses. Development of statistical methods to address missingness have been actively pursued in recent years, including imputation, likelihood and weighting approaches. Each approach is more complicated when there are many patterns of missing values, or when both categorical and continuous random variables are involved. Implementations of routines to incorporate observations with incomplete variables in regression models are now widely available. We review these routines in the context of a motivating example from a large health services research dataset. While there are still limitations to the current implementations, and additional efforts are required of the analyst, it is feasible to incorporate partially observed values, and these methods should be utilized in practice.
Figures
Figure 1
Monotone and non-monotone patterns of missingness (Obs=observed, M=missing)
Figure 2
Use of Likelihood based approach with EM algorithm to incorporate partially
Figure 3
Proposed guidelines for reporting missing covariate data (Burton and Altman 2004)
Figure 4
Description of missing data (using Stata misschk function)
Similar articles
- Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random.
Curnow E, Cornish RP, Heron JE, Carpenter JR, Tilling K. Curnow E, et al. BMC Med Res Methodol. 2024 Oct 7;24(1):231. doi: 10.1186/s12874-024-02353-9. BMC Med Res Methodol. 2024. PMID: 39375597 Free PMC article. - Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study.
Kawabata E, Major-Smith D, Clayton GL, Shapland CY, Morris TP, Carter AR, Fernández-Sanlés A, Borges MC, Tilling K, Griffith GJ, Millard LAC, Smith GD, Lawlor DA, Hughes RA. Kawabata E, et al. BMC Med Res Methodol. 2024 Nov 13;24(1):278. doi: 10.1186/s12874-024-02382-4. BMC Med Res Methodol. 2024. PMID: 39538117 Free PMC article. - A nonparametric multiple imputation approach for missing categorical data.
Zhou M, He Y, Yu M, Hsu CH. Zhou M, et al. BMC Med Res Methodol. 2017 Jun 6;17(1):87. doi: 10.1186/s12874-017-0360-2. BMC Med Res Methodol. 2017. PMID: 28587662 Free PMC article. - Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.
Crider K, Williams J, Qi YP, Gutman J, Yeung L, Mai C, Finkelstain J, Mehta S, Pons-Duran C, Menéndez C, Moraleda C, Rogers L, Daniels K, Green P. Crider K, et al. Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article. - Statistical Methods for Phenotype Estimation and Analysis Using Electronic Health Records [Internet].
Hubbard RA, Chen Y, Chen J, Harton J, Choi G, Oganisian A, Huang J, Utidjian L, Bailey LC, Eneli I, Xu J, Siegel R. Hubbard RA, et al. Washington (DC): Patient-Centered Outcomes Research Institute (PCORI); 2021 Mar. Washington (DC): Patient-Centered Outcomes Research Institute (PCORI); 2021 Mar. PMID: 39133799 Free Books & Documents. Review.
Cited by
- Reducing racial/ethnic disparities in childhood obesity: the role of early life risk factors.
Taveras EM, Gillman MW, Kleinman KP, Rich-Edwards JW, Rifas-Shiman SL. Taveras EM, et al. JAMA Pediatr. 2013 Aug 1;167(8):731-8. doi: 10.1001/jamapediatrics.2013.85. JAMA Pediatr. 2013. PMID: 23733179 Free PMC article. - The impact of lymphopenia on delirium in ICU patients.
Inoue S, Vasilevskis EE, Pandharipande PP, Girard TD, Graves AJ, Thompson J, Shintani A, Ely EW. Inoue S, et al. PLoS One. 2015 May 20;10(5):e0126216. doi: 10.1371/journal.pone.0126216. eCollection 2015. PLoS One. 2015. PMID: 25992641 Free PMC article. - Good agreement between questionnaire and administrative databases for health care use and costs in patients with osteoarthritis.
Pinto D, Robertson MC, Hansen P, Abbott JH. Pinto D, et al. BMC Med Res Methodol. 2011 Apr 13;11:45. doi: 10.1186/1471-2288-11-45. BMC Med Res Methodol. 2011. PMID: 21489280 Free PMC article. - Improving survey methods in sero-epidemiological studies of injecting drug users: a case example of two cross sectional surveys in Serbia and Montenegro.
Judd A, Rhodes T, Johnston LG, Platt L, Andjelkovic V, Simić D, Mugosa B, Simić M, Zerjav S, Parry RP, Parry JV. Judd A, et al. BMC Infect Dis. 2009 Feb 9;9:14. doi: 10.1186/1471-2334-9-14. BMC Infect Dis. 2009. PMID: 19203380 Free PMC article. - Sexual-orientation differences in alcohol use trajectories and disorders in emerging adulthood: results from a longitudinal cohort study in the United States.
Coulter RWS, Jun HJ, Calzo JP, Truong NL, Mair C, Markovic N, Charlton BM, Silvestre AJ, Stall R, Corliss HL. Coulter RWS, et al. Addiction. 2018 Apr 21:10.1111/add.14251. doi: 10.1111/add.14251. Online ahead of print. Addiction. 2018. PMID: 29679419 Free PMC article.
References
- Allison PD. Multiple imputation for missing data: a cautionary tale. Sociological Methods and Research. 2000;28:301–309.
- Allison PD. Missing data. SAGE University Papers; 2002.
- Allison PD. Imputation of categorical variables with PROC MI. 2005. [accessed July 30, 2006]. http://www2.sas.com/proceedings/sugi30/113-30.pdf.
- Barnard J, Meng XL. Applications of multiple imputation in medical studies: from AIDS to NHANES. Statistical Methods in Medical Research. 1999;8:17–36. - PubMed
- Bernaards CA, Belin TR, Schafer JL. Robustness of a multivariate normal approximation for imputation of incomplete binary data. Statistics in Medicine (In press) - PubMed
LinkOut - more resources
Full Text Sources
Other Literature Sources