New Test Statistics To Assess The Goodness-Of-Fit Of Logistic Regression Models (original) (raw)

A Comparison Study of Goodness of Fit Tests of Logistic Regression in R: Simulation and Application to Breast Cancer Data

Academic Journal of Applied Mathematical Sciences

Goodness of fit (GOF) tests of logistic regression attempt to find out the suitability of the model to the data. The null hypothesis of all GOF tests is the model fit. R as a free software package has many GOF tests in different packages. A Monte Carlo simulation has been conducted to study two situations; the first, studying the ability of each test, under its default settings, to accept the null hypothesis when the model truly fitted. The second, studying the power of these tests when assumptions of sufficient linear combination of the explanatory variables are violated (by omitting linear covariate term, quadratic term, or interaction term). Moreover, checking whether the same test in different R packages had the same results or not. As the sample size supposed to affect simulation results, so the pattern of change of GOF tests results under different sample sizes as well as different model settings was estimated. All tests accept the null hypothesis (more than 95% of simulation ...

Goodness‐of‐fit processes for logistic regression: simulation results

Statistics in Medicine, 2002

In this paper we use simulations to compare the performance of new goodness-of-ÿt tests based on weighted statistical processes to three currently available tests: the Hosmer-Lemeshow decile-of-risk test; the Pearson chi-square, and the unweighted sum-of-squares tests. The simulations demonstrate that all tests have the correct size. The power for all tests to detect lack-of-ÿt due to an omitted quadratic term with a sample of size 100 is close to or exceeds 50 per cent to detect moderate departures from linearity and is over 90 per cent for these same alternatives for sample size 500. All tests have low power with sample size 100 to detect lack-of-ÿt due to an omitted interaction between a dichotomous and continuous covariate, while the power exceeds 80 per cent to detect extreme interaction with a sample size of 500. The power is low to detect any alternative link function with sample size 100 and for most alternative links for sample size 500. Only in the case of sample size 500 and an extremely asymmetric link function is the power over 80 per cent. The results from these simulations show that no single test, new or current, performs best in detecting lack-of-ÿt due to an omitted covariate or incorrect link function. However, one of the new weighted tests has power comparable to other tests in all settings simulated and had the highest power in the di cult case of an omitted interaction term. We illustrate the tests within the context of a model for factors associated with abstinence from drug use in a randomized trial of residential treatment programmes. We conclude the paper with a summary and speciÿc recommendations for practice.

Asymptomatic Distribution of Goodness-of-Fit Tests in Logistic Regression Model

Open Journal of Statistics

The logistic regression model has been become commonly used to study the association between a binary response variable; it is widespread application rests on its easy application and interpretation. The subject of assessment of goodness-of-fit in logistic regression model has attracted the attention of many scientists and researchers. Goodness-of-fit tests are methods to determine the suitability of the fitted model. Many of methods proposed and discussed for assessing goodness-of fit in logistic regression model, however, the asymptotic distribution of goodness-of-fit statistics are less examine, it is need more investigated. This work, will focus on assessing the behavior of asymptotic distribution of goodness-of-fit tests, also make comparison between global goodness-of-fit tests, and evaluate it by simulation.

Goodness of Fit in Logistic Regression

As in linear regression, goodness of fit in logistic regression attempts to get at how well a model fits the data. It is usually applied after a " final model " has been selected. As we have seen, often in selecting a model no single " final model " is selected, as a series of models are fit, each contributing towards final inferences and conclusions. In that case, one may wish to see how well more than one model fits, although it is common to just check the fit of one model. This is not necessarily bad practice, because if there are a series of " good " models being fit, often the fit from each will be similar. Recall once again the quote from George Box: " All Models are wrong, but some are useful. " It is not clear how to judge the fit of a model that we know is in fact wrong. Much of the goodness of fit literature is based on hypothesis testing of the following type: H 0 : model is exactly correct H A : model is not exactly correct This type of testing provides no useful information. If the null hypothesis is rejected, then we have learned nothing, because we already knew that it is impossible for any model to be " exactly correct ". On the other hand, if we do not reject the model, it is almost surely because of a lack of statistical power, and as the sample size grows larger, we will eventually surely reject H 0. These tests can be seen not only as not useful, but as harmful if non-rejection of a null hypothesis is misinterpreted as proof that the model " fits well " , which is of course can be far from the truth. If these tests are not useful (despite their popularity in some circles), what else can we do? We can attempt to derive various descriptive measures of how well a model fits,

Investigating the power of goodness-of-fit tests for multinomial logistic regression

Communications in Statistics - Simulation and Computation, 2017

Goodness-of-fit tests are important to assess if the model fits the data. In this paper we investigate the Type I error and power of two goodness-of-fit tests for multinomial logistic regression via a simulation study. The GoF test using partitioning strategy (clustering) in the covariate space, was compared with another test, C g which was based on grouping of predicted probabilities. The power of both tests was investigated when the quadratic term or an interaction term were omitted from the model. The proposed test 2 *G p  shows good Type I error and ample power except for models with highly skewed covariate distribution. The

The exponential generalized log-logistic model: Bagdonavičius-Nikulin test for validation and non-Bayesian estimation methods

Communications for Statistical Applications and Methods

A modified Bagdonavičius-Nikulin chi-square goodness-of-fit is defined and studied. The lymphoma data is analyzed using the modified goodness-of-fit test statistic. Different non-Bayesian estimation methods under complete samples schemes are considered, discussed and compared such as the maximum likelihood least square estimation method, the Cramer-von Mises estimation method, the weighted least square estimation method, the left tail-Anderson Darling estimation method and the right tail Anderson Darling estimation method. Numerical simulation studies are performed for comparing these estimation methods. The potentiality of the new model is illustrated using three real data sets and compared with many other well-known generalizations.

Validation and Performance Analysis of Binary Logistic Regression Model

2010

Application of logistic regression modeling techniques without subsequent performance analysis regarding predictive ability of the fitted model can result in poorly fitting results that inaccurately predict outcomes on new subjects. Model validation is possibly the most important step in the model building sequence. Model validity refers to the stability and reasonableness of the logistic regression coefficients, the plausibility and usability of the fitted logistic regression function, and the ability to generalize inferences drawn from the analysis. The aim of this study is to evaluate and measure how effectively the fitted logistic regression model describes the outcome variable both in the sample and in the population. A straightforward and fairly popular split-sample approach has been used here to validate the model. Different summary measures of goodness-of-fit and other supplementary indices of predictive ability of the fitted model indicate that the fitted binary logistic re...

Comparison Of Statistical Tests In Logistic Regression: The Case Of Hypernatreamia

Journal of Modern Applied Statistical Methods

The logistic regression has become an integral component of any medical data analysis concerning binary responses. The main issue rising after the adaptation of the final model is its goodness-of-fit. The fit of the model is assessed via the overall measures and summary statistics and comparing them in the case of hypernateamia.

A New Goodness-of-Fit Test: Free Chi-Square (FCS)

GAZI UNIVERSITY JOURNAL OF SCIENCE, 2021

Highlights • This paper presents a new goodness-of-fit test. • The proposed method is binning free and distribution free test. • It is more sensitivity, easy to use and fast.

Power and Type I error rates of goodness-of-fit statistics for binomial generalized estimating equations (GEE) models

Computational Statistics & Data Analysis, 2006

Binary outcomes are very common in medical studies. Logistic regression is typically used to analyze independent binary outcomes while generalized estimating equations regression methods (GEE) are often used to analyze correlated binary data. Several goodness-of-fit (GoF) statistics for the GEE methods have been developed recently. The objective of this study is to compare the power and Type I error rates of existing GEE GoF statistics using simulated data under different conditions. The number of clusters was varied in each condition. Different tested models included discrete, continuous, observation-specific and/or cluster-specific covariates. Two or three observations per cluster were generated with various correlations between observations.