Application of Mixture Models for Doubly Inflated Count Data (original) (raw)
Related papers
EM Estimation for Zero- and k-Inflated Poisson Regression Model
Computation, 2021
Count data with excessive zeros are ubiquitous in healthcare, medical, and scientific studies. There are numerous articles that show how to fit Poisson and other models which account for the excessive zeros. However, in many situations, besides zero, the frequency of another count k tends to be higher in the data. The zero- and k-inflated Poisson distribution model (ZkIP) is appropriate in such situations The ZkIP distribution essentially is a mixture distribution of Poisson and degenerate distributions at points zero and k. In this article, we study the fundamental properties of this mixture distribution. Using stochastic representation, we provide details for obtaining parameter estimates of the ZkIP regression model using the Expectation–Maximization (EM) algorithm for a given data. We derive the standard errors of the EM estimates by computing the complete, missing, and observed data information matrices. We present the analysis of two real-life data using the methods outlined i...
EM Estimation for Zero- and \\u3ci\\u3ek\\u3c/i\\u3e-Inflated Poisson Regression Model
2021
Count data with excessive zeros are ubiquitous in healthcare, medical, and scientific studies. There are numerous articles that show how to fit Poisson and other models which account for the excessive zeros. However, in many situations, besides zero, the frequency of another count k tends to be higher in the data. The zero- and k-inflated Poisson distribution model (ZkIP) is appropriate in such situations The ZkIP distribution essentially is a mixture distribution of Poisson and degenerate distributions at points zero and k. In this article, we study the fundamental properties of this mixture distribution. Using stochastic representation, we provide details for obtaining parameter estimates of the ZkIP regression model using the Expectation-Maximization (EM) algorithm for a given data. We derive the standard errors of the EM estimates by computing the complete, missing, and observed data information matrices. We present the analysis of two real-life data using the methods outlined i...
Properties of the zero-and-one inflated Poisson distribution and likelihood-based inference methods
Statistics and Its Interface, 2016
To model count data with excess zeros and excess ones, in their unpublished manuscript, Melkersson and Olsson (1999) extended the zero-inflated Poisson distribution to a zero-and-one-inflated Poisson (ZOIP) distribution. However, the distributional theory and corresponding properties of the ZOIP have not yet been explored, and likelihoodbased inference methods for parameters of interest were not well developed. In this paper, we extensively study the ZOIP distribution by first constructing five equivalent stochastic representations for the ZOIP random variable and then deriving other important distributional properties. Maximum likelihood estimates of parameters are obtained by both the Fisher scoring and expectation-maximization algorithms. Bootstrap confidence intervals for parameters of interest and testing hypotheses under large sample sizes are provided. Simulations studies are performed and five real data sets are used to illustrate the proposed methods.
Zero to k Inflated Poisson Regression Models with Applications
Journal of Statistical Theory and Applications
In the count data set, the frequency of some points may occur more than expected under the standard data analysis models. Indeed, in many situations, the frequencies of zero and of some other points tend to be higher than those of the Poisson. Adapting existing models for analyzing inflated observations has been studied in the literature. A method for modeling the inflated data is the inflated distribution. In this paper, we extend this inflated distribution. Indeed, if inflations occur in three or more of the support point, then the previous models are not suitable. We propose a model based on zero, one, \ldots ,…,andkinflatedpointswithprobabilities… , and k inflated points with probabilities…,andkinflatedpointswithprobabilitiesw_{0},w_1,\ldots ,w0,w1,…,andw 0 , w 1 , … , andw0,w1,…,andw_{k},wk,respectively.Bychoosingtheappropriatevaluesfortheweightsw k , respectively. By choosing the appropriate values for the weightswk,respectively.Bychoosingtheappropriatevaluesfortheweightsw_{0},\ldots ,w_{k},$$ w 0 , … , w k , various inflated distributions, such as the zero-inflated, zero–one-inflated, and zero–k-inflated distributions, are derived as special cases of the proposed mo...
Type I multivariate zero-inflated generalized Poisson distribution with applications
Statistics and Its Interface, 2017
Excessive zeros in multivariate count data are often encountered in practice. Since the Poisson distribution only possesses the property of equi-dispersion, the existing Type I multivariate zero-inflated Poisson distribution (Liu and Tian, 2015, CSDA) [15] cannot be used to model multivariate zero-inflated count data with over-dispersion or under-dispersion. In this paper, we extend the univariate zero-inflated generalized Poisson (ZIGP) distribution to Type I multivariate ZIGP distribution via stochastic representation aiming to model positively correlated multivariate zero-inflated count data with over-dispersion or underdispersion. Its distributional theories and associated properties are derived. Due to the complexity of the ZIGP model, we provide four useful algorithms (a very fast Fisher-scoring algorithm, an expectation/conditional-maximization algorithm, a simple EM algorithm and an explicit majorizationminimization algorithm) for finding maximum likelihood estimates of parameters of interest and develop efficient statistical inference methods for the proposed model. Simulation studies for investigating the accuracy of point estimates and confidence interval estimates and comparing the likelihood ratio test with the score test are conducted. Under both AIC and BIC, our analyses of the two data sets show that Type I multivariate ZIGP model is superior over Type I multivariate zero-inflated Poisson model.
arXiv (Cornell University), 2017
The main object of this article is to present an extension of the zero-inflated Poisson-Lindley distribution, called of zero-modified Poisson-Lindley. The additional parameter π of the zero-modified Poisson-Lindley has a natural interpretation in terms of either zero-deflated/inflated proportion. Inference is dealt with by using the likelihood approach. In particular the maximum likelihood estimators of the distribution's parameter are compared in small and large samples. We also consider an alternative bias-correction mechanism based on Efron's bootstrap resampling. The model is applied to real data sets and found to perform better than other competing models.
Marginalized mixture models for count data from multiple source populations
Journal of Statistical Distributions and Applications, 2017
Mixture distributions provide flexibility in modeling data collected from populations having unexplained heterogeneity. While interpretations of regression parameters from traditional finite mixture models are specific to unobserved subpopulations or latent classes, investigators are often interested in making inferences about the marginal mean of a count variable in the overall population. Recently, marginal mean regression modeling procedures for zero-inflated count outcomes have been introduced within the framework of maximum likelihood estimation of zero-inflated Poisson and negative binomial regression models. In this article, we propose marginalized mixture regression models based on two-component mixtures of non-degenerate count data distributions that provide directly interpretable estimates of exposure effects on the overall population mean of a count outcome. The models are examined using simulations and applied to two datasets, one from a double-blind dental caries incidence trial, and the other from a horticultural experiment. The finite sample performance of the proposed models are compared with each other and with marginalized zero-inflated count models, as well as ordinary Poisson and negative binomial regression.
Analysis of zero-inflated clustered count data: A marginalized model approach
Computational Statistics & Data Analysis, 2011
Min and Agresti (2005) proposed random effect hurdle models for zero-inflated clustered count data with two-part random effects for a binary component and a truncated count component. In this paper, we propose new marginalized models for zero-inflated clustered count data using random effects. The marginalized models are similar to Dobbie and Welsh's (2001) model in which generalized estimating equations were exploited to find estimates. However, our proposed models are based on likelihood-based approach. Quasi-Newton algorithm is developed for estimation. We use these methods to carefully analyze two real datasets.
A Poisson-multinomial mixture approach to grouped and right-censored counts
Communications in Statistics - Theory and Methods, 2017
2 surveys in grouped and right-censored categories, there is a lack of statistical methods simultaneously taking both grouping and right-censoring into account. In this research, we propose a new generalized Poisson-multinomial mixture approach to model grouped and rightcensored (GRC) count data. Based on a mixed Poisson-multinomial process for conceptualizing grouped and right-censored count data, we prove that the new maximum-likelihood estimator (MLE-GRC) is consistent and asymptotically normally distributed for both Poisson and zeroinflated Poisson models. The use of the MLE-GRC, implemented in an R function, is illustrated by both statistical simulation and empirical examples. This research provides a tool for epidemiologists to estimate incidence from grouped and right-censored count data and lays a foundation for regression analyses of such data structure.