Zero-Inflated and Hurdle Models of Count Data with Extra Zeros: Examples from an HIV-Risk Reduction Intervention Trial (original) (raw)
Related papers
Communications in Statistics - Theory and Methods, 2020
Count data often exhibits the property of dispersion and have large number of zeros. In order to take these properties into account, a new generalized negative binomial-Lindley distribution with four parameters is proposed, of which the two-parameter and three-parameter negative binomial-Lindley distributions are special cases. Several statistical properties of the proposed distribution are presented. The dispersion index for the proposed distribution is derived and based on the index, it is clear that the proposed distribution can adequately fit the data with properties of overdispersion or underdispersion depending on the choice of the parameters. The proposed distribution is fitted to three overdispersed datasets with large proportion of zeros. The best fitted model is selected based on the values of AIC, mean absolute error and root mean squared error. From the model fittings, it can be concluded that the proposed distribution outperforms Poisson and negative binomial distributions in fitting the count data with overdispersion and large number of zeros.
Estimation Parameters And Modelling Zero Inflated Negative Binomial
CAUCHY, 2016
Regression analysis is used to determine relationship between one or several response variable (Y) with one or several predictor variables (X). Regression model between predictor variables and the Poisson distributed response variable is called Poisson Regression Model. Since, Poisson Regression requires an equality between mean and variance, it is not appropriate to apply this model on overdispersion (variance is higher than mean). Poisson regression model is commonly used to analyze the count data. On the count data type, it is often to encounteredd some observations that have zero value with large proportion of zero value on the response variable (zero Inflation). Poisson regression can be used to analyze count data but it has not been able to solve problem of excess zero value on the response variable. An alternative model which is more suitable for overdispersion data and can solve the problem of excess zero value on the response variable is Zero Inflated Negative Binomial (ZIN...
Comparison of Statistical Models in Modeling Over- Dispersed Count Data with Excess Zeros
2019
Generalised Linear Models such as Poisson and Negative Binomial models have been routinely used to model count data. But, these models assumptions are violated when the data exhibits over-dispersion and zero-inflation. Over-dispersion is as a result of excess zeros in the data. For modelling data with such characteristics several extensions of Negative Binomial and Poisson models have been proposed, such as zero-inflated and Hurdles models. Our study focus is on identifying the most statistically fit model(s) which can be adopted in presence of over-dispersion and excess zeros in the count data. We simulate data-sets at varying proportions of zeros and varying proportions of dispersion then fit the data to a Poisson, Negative Binomial, Zero-inflated Poisson, Zero-inflated Negative Binomial, Hurdles Poisson and Negative Binomial Hurdles. Model selection is based on AIC, log-likelihood, Vuong statistics and Box-plots. The results obtained, suggest that Negative Binomial Hurdles performed well in most scenarios compared to other models hence, the most statistically fit model for overdispersed count data with excess zeros.
IOP Conference Series: Materials Science and Engineering
Poisson regression analysis shows the relationship between predictor variables and response variables that follow the Poisson distribution which has equal dispersion and average values (λ), a situation called equidispersion. However, the variance can also be greater than the average value, called overdispersion. This can be caused by excess opportunities for the emergence of zero values in the response variable or zero excess. The parameter of the overdispersed data analysis can be underestimated so that the results become biased. This bias issue can be, hopefully, overcome by the Zero Inflated Negative Binomial (ZINB) regression analysis. In the 2016 Maternal Mortality Rate data in Bojonegoro District, overdipersion was overcome by ZINB regression even though there was no significant predictor variable found affecting the response variable. ZINB regression analysis can also be applied to generated data (simulation). We had the data with average λ = (0.2, 0.4, 0.6, 0.8, 1.0, 5.0) proportion of zeros p = (0.4, 0.6, 0.8), and the number of observations n = (200,500, 800), with each setting was repeated 100 times. From the simulation study it was found that all overdispersion events were always accompanied by zero excess events but not vice versa. The greater the value of λ then the greater the dispersion coefficient. The ZINB regression is proven to be able to overcome overdispersion in various conditions of different values of λ, p, n which can be seen from the value τ (dispersion coefficient) after ZINB regression is less than 1 in all conditions.
Poisson and negative binomial regression models for zero-inflated data: an experimental study
Communications Faculty Of Science University of Ankara Series A1Mathematics and Statistics
Count data regression has been widely used in various disciplines, particularly health area. Classical models like Poisson and negative binomial regression may not provide reasonable performance in the presence of excessive zeros and overdispersion problems. Zero-inflated and Hurdle variants of these models can be a remedy for dealing with these problems. As well as zero-inflated and Hurdle models, alternatives based on some biased estimators like ridge and Liu may improve the performance against to multicollinearity problem except excessive zeros and overdispersion. In this study, ten different regression models including classical Poisson and negative binomial regression with their variants based on zero-inflated, Hurdle, ridge and Liu approaches have been compared by using a health data. Some criteria including Akaike information criterion, log-likelihood value, mean squared error and mean absolute error have been used to investigate the performance of models. The results show th...
A generalized negative binomial distribution based on an extended Poisson process
Brazilian Journal of Probability and Statistics, 2010
In this article we propose a generalized negative binomial distribution, which is constructed based on an extended Poisson process (a generalization of the homogeneous Poisson process). This distribution is intended to model discrete data with presence of zero-inflation and over-dispersion. For a dataset on animal abundance which presents over-dispersion and a high frequency of zeros, a comparison between our extended distribution and other common distributions used for modeling this kind of data is addressed, supporting the fitting of the proposed model.
Application of negative binomial modeling for discrete outcomes
Journal of Clinical Epidemiology, 2003
We present a case study using the negative binomial regression model for discrete outcome data arising from a clinical trial designed to evaluate the effectiveness of a prehabilitation program in preventing functional decline among physically frail, community-living older persons. The primary outcome was a measure of disability at 7 months that had a range from 0 to 16 with a mean of 2.8 (variance of 16.4) and a median of 1. The data were right skewed with clumping at zero (i.e., 40% of subjects had no disability at 7 months). Because the variance was nearly 6 times greater than the mean, the negative binomial model provided an improved fit to the data and accounted better for overdispersion than the Poisson regression model, which assumes that the mean and variance are the same. Although correcting the variance and corresponding test statistics for overdispersion is a standard procedure in the Poisson model, the estimates of the regression parameters are inefficient because they have more sampling variability than is necessary. The negative binomial model provides an alternative approach for the analysis of discrete data where overdispersion is a problem, provided that the model is correctly specified and adequately fits the data. Ć
On Zero-Modified Poisson-Sujatha Distribution to Model Overdispersed Count Data
Austrian Journal of Statistics, 2018
In this paper we propose the zero-modified Poisson-Sujatha distribution as an alternative to model overdispersed count data exhibiting inflation or deflation of zeros. It will be shown that the zero modification can be incorporated by using the zero-truncated Poisson-Sujatha distribution. A simple reparametrization of the probability function will allow us to represent the zero-modified Poisson-Sujatha distribution as a hurdle model. This trick leads to the fact that proposed model can be fitted without any previously information about the zero modification present in a given dataset. The maximum likelihood theory will be used for parameter estimation and asymptotic inference concerns. A simulation study will be conducted in order to evaluate some frequentist properties of the developed methodology. The usefulness of the proposed model will be illustrated using real datasets of the biological sciences field and comparing it with other models available in the literature.
Zero inflated negative binomial-generalized exponential distributionand its applications
Songklanakarin Journal of Science and Technology, 2014
In this paper, we propose a new zero inflated distribution, namely, the zero inflated negative binomial-generalized exponential (ZINB-GE) distribution. The new distribution is used for count data with extra zeros and is an alternative for data analysis with over-dispersed count data. Some characteristics of the distribution are given, such as mean, variance, skewness, and kurtosis. Parameter estimation of the ZINB-GE distribution uses maximum likelihood estimation (MLE) method. Simulated and observed data are employed to examine this distribution. The results show that the MLE method seems to have highefficiency for large sample sizes. Moreover, the mean square error of parameter estimation is increased when the zero proportion is higher. For the real data sets, this new zero inflated distribution provides a better fit than the zero inflated Poisson and zero inflated negative binomial distributions.
Communications in Statistics - Simulation and Computation, 2017
Overdispersion is a problem encountered in the analysis of count data that can lead to invalid inference if unaddressed. Decision about whether data are overdispersed is often reached by checking whether the ratio of the Pearson chi-square statistic to its degrees of freedom is greater than one; however, there is currently no fixed threshold for declaring the need for statistical intervention. We consider simulated cross-sectional and longitudinal datasets containing varying magnitudes of overdispersion caused by outliers or zero inflation, as well as real datasets, to determine an appropriate threshold value of this statistic which indicates when overdispersion should be addressed.