Testing Approaches for Overdispersion in Poisson Regression versus the Generalized Poisson Model (original) (raw)
Related papers
Alternative Models in Overcoming the Problem of Overdispersion in Poisson Regression
Jurnal TAMBORA
This study aims to compare various alternative models in overcoming the problem of overdispersion in Poisson regression modeling. The comparative modeling is the Generalized Poisson model, Negative Binomial, and Generalized Negative Binomial. Modeling is applied to modeling the number of poor people in Central Java in 2021 with unemployment, HDI, and GRDP as independent variables. The results obtained by Generalized Poison are better than Negative Binomial and Generalized Negative Binomial because of the smaller AIC and BIC values ??and the larger R2. For simultaneous tests, it can be concluded that unemployment, HDI, and GRDP significantly affect the number of poor people. Only unemployment and HDI variables partially affect the number of poor people in Central Java. On the other hand, there is not enough evidence that GRDP affects some poor people. There is a need for comprehensive and relevant policies to overcome the number of poor people in an area.
Communications in Statistics - Simulation and Computation, 2017
Overdispersion is a problem encountered in the analysis of count data that can lead to invalid inference if unaddressed. Decision about whether data are overdispersed is often reached by checking whether the ratio of the Pearson chi-square statistic to its degrees of freedom is greater than one; however, there is currently no fixed threshold for declaring the need for statistical intervention. We consider simulated cross-sectional and longitudinal datasets containing varying magnitudes of overdispersion caused by outliers or zero inflation, as well as real datasets, to determine an appropriate threshold value of this statistic which indicates when overdispersion should be addressed.
Analysis Of Overdispersed Count Data By Poisson Model
2021
Lack assumption that commonly happens in Poisson model is over-dispersion. Over-dispersion is a condition in which the variance value is larger than mean of response variable. The aim of this research is to analyze Poisson models, i.e. Poisson Regression (POI), Zero-Inflated Poisson Regression (ZIP), Generalized Poisson Regression (GP) and Zero-Inflated Generalized Poisson Regression (ZIGP) of over-dispersion data. The data used in this research is Indonesian Demographic and Health Survey (SKDI) Data in 2017. Total number of 17.212 families with response variable of child mortality in these families become the objects of the study. The estimator of parameter model is Maximum likelihood estimator (MLE). The results analysis of those four models aforementioned above show that over-dispersion case causes the usage of POI model becomes less appropriate, while GP model can be used for over-dispersion case, however if the case of over-dispersion is caused by zero excess in the data, GP wi...
Detecting overdispersion in count data: A zero-inflated Poisson regression analysis
Journal of Physics: Conference Series, 2017
This study focusing on analysing count data of butterflies communities in Jasin, Melaka. In analysing count dependent variable, the Poisson regression model has been known as a benchmark model for regression analysis. Continuing from the previous literature that used Poisson regression analysis, this study comprising the used of zero-inflated Poisson (ZIP) regression analysis to gain acute precision on analysing the count data of butterfly communities in Jasin, Melaka. On the other hands, Poisson regression should be abandoned in the favour of count data models, which are capable of taking into account the extra zeros explicitly. By far, one of the most popular models include ZIP regression model. The data of butterfly communities which had been called as the number of subjects in this study had been taken in Jasin, Melaka and consisted of 131 number of subjects visits Jasin, Melaka. Since the researchers are considering the number of subjects, this data set consists of five families of butterfly and represent the five variables involve in the analysis which are the types of subjects. Besides, the analysis of ZIP used the SAS procedure of overdispersion in analysing zeros value and the main purpose of continuing the previous study is to compare which models would be better than when exists zero values for the observation of the count data. The analysis used AIC, BIC and Voung test of 5% level significance in order to achieve the objectives. The finding indicates that there is a presence of over-dispersion in analysing zero value. The ZIP regression model is better than Poisson regression model when zero values exist.
A score test for overdispersion in zero-inflated poisson mixed regression model
Statistics in Medicine, 2007
Count data with extra zeros are common in many medical applications. The zero-inflated Poisson (ZIP) regression model is useful to analyse such data. For hierarchical or correlated count data where the observations are either clustered or represent repeated outcomes from individual subjects, a class of ZIP mixed regression models may be appropriate. However, the ZIP parameter estimates can be severely biased if the non-zero counts are overdispersed in relation to the Poisson distribution. In this paper, a score test is proposed for testing the ZIP mixed regression model against the zero-inflated negative binomial alternative. Sampling distribution and power of the test statistic are evaluated by simulation studies. The results show that the test statistic performs satisfactorily under a wide range of conditions. The test procedure is applied to pancreas disorder length of stay that comprised mainly same-day separations and simultaneous prolonged hospitalizations.
Comparisons of tests of distributional assumption in Poisson regression model
Communications in Statistics - Simulation and Computation, 2016
Count data consists of discrete non-negative integer values. Poisson regression model is one of the most popular model used to model count data. This model assumes that response variable has Poisson distribution. The purpose of this article is to assess distributional assumption of this model by using some goodness of fit tests. These tests are compared in respect to type I error and power rates of tests with different samples, parameters and sample sizes. Simulation study suggests that the most powerful tests are generally Dean-Lawless and Cameron-Trivedi score tests.
A rich family of generalized Poisson regression models with applications
Mathematics and Computers in Simulation, 2005
The Poisson regression (PR) model is inappropriate for modeling over-or under-dispersed (or inflated) data. Several generalizations of PR model have been proposed for modeling such data. In this paper, a rich family of generalized Poisson regression (GPR) models is reviewed in detail. The family has a wide range of applications in various disciplines including agriculture, econometrics, patent applications, species abundance, medicine, and use of recreational facilities. For illustrating the usefulness of the family, several applications with different situations are given. For example, hospital discharge counts are modeled using GPR and other generalized models, in which the applied models show that household size, education, and income are positively related to diagnosis-related groups (DRGs) hospital discharges. One of the advantages of using the family is that it lets data determine which model is appropriate for a given situation. It is expected that the results discussed in the paper would enhance our understanding of various forms of count data originating from primary health care facilities and medical domains.
Can Generalized Poisson model replace any other count data models? An evaluation
Clinical Epidemiology and Global Health
Background: Count data represents the number of occurrences of an event within a fixed period of time. In count data modelling, overdispersion is inevitable. Sometimes, this overdispersion may not be just due to the excess zeros but may be due to the presence of two or more mixtures. Hence the main objective is to examine for the presence of mixtures if any, with excess zeros and compare Generalized Poisson model, Mixture models with other count data models using real time and simulated data. Methods: Three real time over-dispersed datasets were used for the comparison of the models. The real time data models were compared using information criteria like AIC and BIC and regression coefficients. Data was also simulated using mixture Poisson with excess zeros. The simulation was repeated for different sample sizes were used to identify the better model. Results: Generalized Poisson showed consistently lower bias and MSE when compared to the other models for varying sample of sizes. AIC and BIC values were almost similar for Generalized Poisson, ZIP and Mixture Poisson model. Similar findings were also obtained from real time data. Conclusion: Generalized Poisson models provides a better fit for overdispersed data due to excess zeros, consistently in real time and simulated with varying sample sizes. Negative Binomial models can be redistricted or reevaluated against Generalized Poisson model.
International Journal of Scientific & Technology Research, 2013
Data on the number of cervical cancer cases are discrete data (count) which are usually analyzed with Poisson regression. The characteristics of the Poisson regression mean and variance must be the same, whereas in fact the count data is often becoming variance greater than the mean, which is often referred to over dispersion. To deal with the problem over dispersion, modelling can be done with Generalized Poisson Regression (GPR) and a Negative Binomial Regression because it does not require the mean value equal to the value of variance. Model GPR produces AIC value of 317.70. While the negative binomial regression models produced by AIC value 312.43. Then the best model is obtained from the negative binomial regression model because it produces the smallest AIC value.