Analysis of zero-inflated clustered count data: A marginalized model approach (original) (raw)

Modeling Zero-inflated Clustered Count Data: A Semiparametric Approach

2014

This paper proposes to use an additive semiparametric Poisson regression in modeling zero-inflated clustered data. Two estimation methods are exploited in this paper based on de Vera (2010). The first simultaneously estimates both the parametric and nonparametric parts of the model. The second utilizes the backfitting algorithm by smoothing the nonparametric function of the covariates and then estimating the parametric parts of the postulated model. The predictive accuracy, measured in terms of root mean square error (RMSE), of the proposed methods is compared to that of ordinary Zero-Inflated Poisson (ZIP) regression model. Through a simulation study, the average RMSE of the ordinary ZIP regression model is at most 81% and 27% higher for equal and unequal cluster sizes, respectively, than that of proposed model whose parametric and nonparametric parts are simultaneously estimated.

A Marginalized Model for Zero-Inflated, Overdispersed, and Correlated Count Data

Iddi and Molenberghs (2012) merged the attractive features of the so-called combined model of Molenberghs et al. (2010) and the marginalized model of Heagerty (1999) for hierarchical non-Gaussian data with overdispersion. In this model, the fixed-effect parameters retain their marginal interpretation. Lee et al. (2011) also developed an extension of Heagerty (1999) to handle zero-inflation from count data, using the hurdle model. To bring together all of these features, a marginalized, zero-inflated, overdispersed model for correlated count data is proposed. Using two empirical sets of data, it is shown that the proposed model leads to important improvements in model fit.

Adequacy of H-Likelihood Estimation Method for Unbalanced Clustered Counting Data Models

2020

This article would concentrate on hierarchical generalized linear models, including generalized linear mixed-models, which are the extension of linear models. In generalized linear models, the dependent variable assumes every distribution from exponential family distributions, e.g., normal, poisson, binomial, gamma, etc. The poisson-gamma method was applied, where the dependent variable represents the poisson distribution and the standard error is defined by the gamma distribution. In generalized linear models, several estimation methods have been used. Throughout this study, the hierarchical likelihood estimation method was used to determine the effectiveness of this methodology for both data balanced and unbalanced. This article compares the Adequacy of poisson-gamma H-Likelihood estimation method of mixed effects clustered data models with equal and unequal cluster sizes. This was evaluated in terms of probability of type-I error rate, power and standard error by applying compute...

Zero-inflated count regression models with applications to some examples

Quality & Quantity, 2012

In this paper, we employed SAS PROC NLMIXED (Nonlinear mixed model procedure) to analyze three example data having inflated zeros. Examples used are data having covariates and no covariates. The covariates utilized in this article have binary outcomes to simplify our analysis. Of course the analysis can readily be extended to situations with several covariates having multiple levels. Models fitted include the Poisson (P), the negative binomial (NB), the generalized Poisson (GP), and their zero-inflated variants, namely the ZIP, the ZINB and the ZIGP models respectively. Parameter estimates as well as the appropriate goodness-of-fit statistic (the deviance D) in this case are computed and in some cases, the Pearson's X 2 statistic, that is based on the variance of the relevant model distribution is also computed. Also obtained are the expected frequencies for the models and GOF tests are conducted based on the rule established by Lawal (Appl Stat 29:292-298, 1980). Our results extend previous results on the analysis of the chosen data in this example. Further, results obtained are very consistent with previous analyses on the data sets chosen for this article. We also present an hierarchical figure relating all the models employed in this paper. While we do not pretend that the results obtained are entirely new, however, the analyses give opportunities to researchers in the field the much needed means of implementing these models in SAS without having to resort to S-PLUS, R or Stata.

Random effect models for repeated measures of zero-inflated count data

Statistical Modelling, 2005

For count responses, the situation of excess zeros (relative to what standard models allow) often occurs in biomedical and sociological applications. Modeling repeated measures of zero-inflated count data presents special challenges. This is because in addition to the problem of extra zeros, the correlation between measurements upon the same subject at different occasions needs to be taken into account. This article discusses random effect models for repeated measurements on this type of response variable. A useful model is the hurdle model with random effects, which separately handles the zero observations and the positive counts. In maximum likelihood model fitting, we consider both a normal distribution and a nonparametric approach for the random effects. A special case of the hurdle model can be used to test for zero inflation. Random effects can also be introduced in a zero-inflated Poisson or negative binomial model, but such a model may encounter fitting problems if there is ...

Statistical model for overdispersed count outcome with many zeros : an approach for marginal inference

2016

Marginalised models are in great demand by many researchers in the life sciences, particularly in clinical trials, epidemiology, health-economics, surveys and many others, since they allow generalisation of inference to the entire population under study. For count data, standard procedures such as the Poisson regression and negative binomial model provide population average inference for model parameters. However, occurrence of excess zero counts and lack of independence in empirical data have necessitated their extension to accommodate these phenomena. These extensions, though useful, complicate interpretations of effects. For example, the zero-inflated Poisson model accounts for the presence of excess zeros, but the parameter estimates do not have a direct marginal inferential ability as the base model, the Poisson model. Marginalisations due to the presence of excess zeros are underdeveloped though demand for them is interestingly high. The aim of this paper,therefore, is to deve...

Statistical model for overdispersed count outcome with many zeros: an approach for direct marginal inference

2015

Marginalized models are in great demand by most researchers in the life sciences particularly in clinical trials, epidemiology, health-economics, surveys and many others since they allow generalization of inference to the entire population under study. For count data, standard procedures such as the Poisson regression and negative binomial model provide population average inference for model parameters. However, occurrence of excess zero counts and lack of independence in empirical data have necessitated their extension to accommodate these phenomena. These extensions, though useful, complicates interpretations of effects. For example, the zero-inflated Poisson model accounts for the presence of excess zeros but the parameter estimates do not have a direct marginal inferential ability as its base model, the Poisson model. Marginalizations due to the presence of excess zeros are underdeveloped though demand for such is interestingly high. The aim of this paper is to develop a margina...

Application of Mixture Models for Doubly Inflated Count Data

Analytics

In health and social science and other fields where count data analysis is important, zero-inflated models have been employed when the frequency of zero count is high (inflated). Due to multiple reasons, there are scenarios in which an additional count value of k > 0 occurs with high frequency. The zero- and k-inflated Poisson distribution model (ZkIP) is more appropriate for such situations. The ZkIP model is a mixture distribution with three components: degenerate distributions at 0 and k count and a Poisson distribution. In this article, we propose an alternative and computationally fast expectation–maximization (EM) algorithm to obtain the parameter estimates for grouped zero and k-inflated count data. The asymptotic standard errors are derived using the complete data approach. We compare the zero- and k-inflated Poisson model with its zero-inflated and non-inflated counterparts. The best model is selected based on commonly used criteria. The theoretical results are supplemen...

A constrained marginal zero-inflated binomial regression model

Communications in Statistics - Theory and Methods, 2020

Zero-inflated models have become a popular tool for assessing the relationships between explanatory variables and a zero-inflated count outcome. In these models, regression coefficients have latent class interpretations, where the latent classes correspond to a susceptible subpopulation with observations generated from a count distribution and a non-susceptible subpopulation that provides only zero counts. However, it is often of interest to evaluate covariates effects in the overall mixture population, that is, on the marginal mean of the zeroinflated count response. Marginal zero-inflated models, such as the marginal zero-inflated Poisson and negative binomial models, have been developed for that purpose. They specify independent submodels for the susceptibility probability and the marginal mean of the count response. When the count outcome is bounded, it is tempting to formulate a marginal zero-inflated binomial model in the same fashion. This, however, is not possible, due to the inherent constraints that relate, in the zero-inflated binomial model, the susceptibility probability and the latent and marginal means of the count outcome. In this paper, we propose a marginal zero-inflated binomial regression model that accommodates these constraints. We construct maximum likelihood estimates of the regression parameters. Their asymptotic properties are established and their finite-sample behaviour is examined by simulations. An application of the proposed model to the analysis of health-care demand is provided for illustration.

A New Regression Model for the Analysis of Overdispersed and Zero-Modified Count Data

Entropy, 2021

Count datasets are traditionally analyzed using the ordinary Poisson distribution. However, said model has its applicability limited, as it can be somewhat restrictive to handling specific data structures. In this case, the need arises for obtaining alternative models that accommodate, for example, overdispersion and zero modification (inflation/deflation at the frequency of zeros). In practical terms, these are the most prevalent structures ruling the nature of discrete phenomena nowadays. Hence, this paper’s primary goal was to jointly address these issues by deriving a fixed-effects regression model based on the hurdle version of the Poisson–Sujatha distribution. In this framework, the zero modification is incorporated by considering that a binary probability model determines which outcomes are zero-valued, and a zero-truncated process is responsible for generating positive observations. Posterior inferences for the model parameters were obtained from a fully Bayesian approach ba...