Maximum Likelihood Multiple Imputation: Faster Imputations and Consistent Standard Errors Without Posterior Draws (original) (raw)
August 2021 Maximum Likelihood Multiple Imputation: Faster Imputations and Consistent Standard Errors Without Posterior Draws
Paul T. von Hippel,Jonathan W. Bartlett
Author Affiliations +
Paul T. von Hippel,1 Jonathan W. Bartlett2
1Paul T. von Hippel is Associate Professor, LBJ School of Public Affairs, University of Texas, Austin, Texas, USA 78712
2Jonathan W. Bartlett is Reader in Statistics, University of Bath, BA2 7AY, UK
Statist. Sci. 36(3): 400-420 (August 2021). DOI: 10.1214/20-STS793
Abstract
Multiple imputation (MI) is a method for repairing and analyzing data with missing values. MI replaces missing values with a sample of random values drawn from an imputation model. The most popular form of MI, which we call posterior draw multiple imputation (PDMI), draws the parameters of the imputation model from a Bayesian posterior distribution. An alternative, which we call maximum likelihood multiple imputation (MLMI), estimates the parameters of the imputation model using maximum likelihood (or equivalent). Compared to PDMI, MLMI is faster and yields slightly more efficient point estimates.
A past barrier to using MLMI was the difficulty of estimating the standard errors of MLMI point estimates. We derive, implement and evaluate three consistent standard error formulas: (1) one combines variances within and between the imputed datasets, (2) one uses the score function and (3) one uses the bootstrap with two imputations of each bootstrapped sample. Formula (1) modifies for MLMI a formula that has long been used under PDMI, while formulas (2) and (3) can be used without modification under either PDMI or MLMI. We have implemented MLMI and the standard error estimators in the mlmi and bootImpute packages for R.
Citation
Paul T. von Hippel. Jonathan W. Bartlett. "Maximum Likelihood Multiple Imputation: Faster Imputations and Consistent Standard Errors Without Posterior Draws." Statist. Sci. 36 (3) 400 - 420, August 2021. https://doi.org/10.1214/20-STS793
Information
Published: August 2021
First available in Project Euclid: 28 July 2021
Digital Object Identifier: 10.1214/20-STS793
Keywords: incomplete data, missing data
Rights: Copyright © 2021 Institute of Mathematical Statistics
ACCESS THE FULL ARTICLE
PURCHASE THIS CONTENT
PURCHASE SINGLE ARTICLE
Price: $30.00
Includes PDF & HTML, when available
Vol.36 • No. 3 • August 2021