Handling Missing Data with Expectation Maximization Algorithm (original) (raw)

Some Simplifications for the Expectation Maximization (Em) Algorithm: The Linear Regression Model Case

The EM algorithm is a generic tool that offers maximum likelihood solutions when datasets are incomplete with data values missing at random or completely at random. At least for its simplest form, the algorithm can be rewritten in terms of an ANCOVA regression specification. This formulation allows several analytical results to be derived that permit the EM algorithm solution to be expressed in terms of new observation predictions and their variances. Implementations can be made with a linear regression or a nonlinear regression model routine, allowing missing value imputations, even when they must satisfy constraints. Fourteen example datasets gleaned from the EM algorithm literature are reanalyzed. Imputation results have been verified with SAS PROC MI. Six theorems are proved that broadly contextualize imputation findings in terms of the theory, methodology, and practice of statistical science.

Missing data imputation in multivariate t distribution with unknown degrees of freedom using expectation maximization algorithm and its stochastic variants

Model Assisted Statistics and Applications, 2020

Many researchers encounter the missing data problem. The phenomenon may be occasioned by data omission, non-response, death of respondents, recording errors, among others. It is important to find an appropriate data imputation technique to fill in the missing positions. In this study, the Expectation Maximization (EM) algorithm and two of its stochastic variants, stochastic EM (SEM) and Monte Carlo EM (MCEM), are employed in missing data imputation and parameter estimation in multivariate t distribution with unknown degrees of freedom. The imputation efficiencies of the three methods are then compared using mean square error (MSE) criterion. SEM yields the lowest MSE, making it the most efficient method in data imputation when the data assumes the multivariate t distribution. The algorithm’s stochastic nature enables it to avoid local saddle points and achieve global maxima; ultimately increasing its efficiency. The EM and MCEM techniques yield almost similar results. Large sample d...

Optimal imputation of the missing data using multi auxiliary information

Computational Statistics, 2020

This article deals with some new imputation methods by extending the work of Bhushan and Pandey using multi-auxiliary information. The popularly used imputation like mean imputation, ratio method of imputation, regression method of imputation and power transformation method are special cases of the proposed methods apart from being less efficient than the proposed methods. The proposed imputation methods can be considered as an efficient extension to the work

How efficient is estimation with missing data

2011

In this paper, we present a new evaluation approach for missing data techniques (MDTs) where the efficiency of those are investigated using listwise deletion method as reference. We experiment on classification problems and calculate misclassification rates (MR) for different missing data percentages (MDP) using a missing completely at random (MCAR) scheme. We compare three MDTs: pairwise deletion (PW), mean imputation (MI) and a maximum likelihood method that we call complete expectation maximization (CEM). We use a synthetic dataset, the Iris dataset and the Pima Indians Diabetes dataset. We train a Gaussian mixture model (GMM). We test the trained GMM for two cases, in which test dataset is missing or complete. The results show that CEM is the most efficient method in both cases while MI is the worst performer of the three. PW and CEM proves to be more stable, in particular for higher MDP values than MI.

Improved estimators for mean estimation in presence of missing information

Alexandria Engineering Journal, 2021

The treatment of incomplete data is an important step in statistical data analysis of most survey datasets. Missing values creates a boisterous situation for the survey researchers in producing the precise estimate of the desired population parameters. To handle these situations, imputation methods play a significant role in filling incomplete response values when it is necessary to use information on complete sampled units and not to discard the data with missingness. Keeping this in mind, our motive is to propose various improved exponential type imputation methods and the corresponding resultant estimators by using ancillary information. The properties (biases and mean square errors) of developed estimators have been examined. It has been shown that the estimators of population mean under similar circumstances due to Prasad [1-3] and some other estimators are special case of our suggested class of estimators. Results are obtained by using simulation studies and it shows the desired performance over others.

EM Method of Estimation of Missing Data ( Key Assumptions and Method for Applied Analysis )

2018

Missing data are pervasive, and pose problems for many statistical procedures. We all should be using methods that treat missing data properly, rather than deleting data or using single imputation. Importantly, it is not difficult to implement these missing data. A relatively few absent observations on some variables can dramatically shrink the sample size. As a result, the precision of confidence intervals is harmed, statistical power weakens and the parameter estimates may be biased. In this paper we used EM method that aims to estimate Missing Values, and summarizes how to apply out EM method to estimation missing data using SPSS. © 2018 Elixir All rights reserved. Elixir Inter. Busi. Mgmt. 124 (2018) 52244-52247 International Business Management Available online at www.elixirpublishers.com (Elixir International Journal) Montasir A. A. Mohammed et al./ Elixir Inter. Busi. Mgmt. 124 (2018) 52244-52247 52245 b) Missing at random (MAR): a weaker assumption than MCAR: The probability...

A new method of multiple imputation for completely (or almost completely) missing data

One of the important questions the researcher must answer assessing data quality while preparing information for a data mining procedure is whether missing observations in the dataset are missing at random, and whether some form of imputation is needed. If all (or almost all) observations of a variable are missing, they cannot be classified as miss-ing at random. Therefore, most known methods of imputation of missing values cannot be applied to this variable. This paper studies a particular way for creating imputations in datasets containing completely (or almost completely) missing variables. As it is shown in the paper, if no external data are available, the maximum entropy distribution is the only reasonable probability distribution for producing proper imputation in case of such variables. Two examples of real-life epidemiological studies demonstrate this approach.

Some Applications of Expectation Maximization Algorithm

Nguyen, L. (2022, March 25). Some Applications of Expectation Maximization Algorithm (1st ed.). (O. Sabazova, Ed.) Eliva Press, 2022

Expectation maximization (EM) algorithm is a popular and powerful mathematical method for statistical parameter estimation in case that there exist both observed data and hidden data. This book focuses on applications of EM in which the implicit relationship is essential to connect observed data and hidden data. In other words, such applications reinforce EM which in turn extends estimation methods like maximum likelihood estimation (MLE) or moment method.

A Review On Missing Value Estimation Using Imputation Algorithm

Journal of Physics: Conference Series, 2017

The presence of the missing value in the data set has always been a major problem for precise prediction. The method for imputing missing value needs to minimize the effect of incomplete data sets for the prediction model. Many algorithms have been proposed for countermeasure of missing value problem. In this review, we provide a comprehensive analysis of existing imputation algorithm, focusing on the technique used and the implementation of global or local information of data sets for missing value estimation. In addition validation method for imputation result and way to measure the performance of imputation algorithm also described. The objective of this review is to highlight possible improvement on existing method and it is hoped that this review gives reader better understanding of imputation method trend.