An Empirical Comparison of the Anova F-Test, Normal Scores Test and Kruskal-Wallis Test Under Violation of Assumptions (original) (raw)
Related papers
The ANOVA F-Test Versus The Kruskal-Wallis Test: A Robustness Study
1974
Researchers are often in a dilemma as to whether parametric or nonparametric procedures should be cited when assumptions of the parametric methods are thought to be violated. Therefore, the Kruskal-Wallis test and the ANOVA F-test were empirically compared in terms of probability of a Type I error and power under various patterns of mean differences in combination with patterns of variance inequality, and patterns of sample size inequality. The Kruskal-Wallis test was found to be competitive with the ANOVA P-test in terms of alpha but not for power. Power of the Kruskal-Wallis test was grossly affected in all but one situation for nonstepwise mean differences when sample sizes and variances were negatively related and when small levels of significance were utilized. The ANOVA F-test, however, was found to be generally robust for the types of specified mean differences.
Studies have shown that ANOVA F-test has a lower performance against heterogeneity of variances. It is important to provide more information on its alternatives and other methods that can prove useful. As a general guideline, Welch's ANOVA is a best alternative with low type 1 error rate in all cases of different population variances compared to other methods used in this study. In addition to Welch's ANOVA, Marascuilo's alternative to this test gives a less accurate result but provides simpler calculation methods. Similar to Moder K. (2010) , Kruskal Wallis test and Hotelling's T 2 test were taken into consideration. Kruskal Wallis test had higher type 1 error rate similar to the ANOVA F-test. Hotelling's T 2 test had significantly lower type 1 error rate in comparison to the ANOVA F-test and the Kruskal Wallis test. Depending on the amount of observations different studies may have, a multivariate analysis of variance using Hotelling's T 2 test is advisable. Otherwise Welch's ANOVA is a better choice for a test with lower type 1 error rate.
Comparing the performance of modified Ft statistic with ANOVA and Kruskal Wallis test
Applied Mathematics & Information Sciences, 2013
ANOVA is a classical test statistics for testing the equality of groups. However this test is very sensitive to nonnormality as well as variance heterogeneity. To overcome the problem of nonnormality, robust method such as F t test statistic can be used but the test statistic can only perform well when the assumption of homoscedasticity is met. This is due to the biasness of mean as a central tendency measure. This study proposed a robust procedure known as modified F t method which combines the F t statistics with one of the popular robust scale estimators, MAD n , T n and LMS n. A simulation study was conducted to compare the robustness (Type I error) of the method with respect to its counterpart from the parametric and non parametric aspects, ANOVA and Kruskal Wallis respectively. This innovation enhances the ability of modified F t statistic to provide good control of Type I error rates. The findings were in favor of the modified F t method especially for skewed data. The performance of the method was demonstrated on real education data
RIPS, 2019
Student’s t-test and classical F-test ANOVA rely on the assumptions that two or more samples are independent, and that independent and identically distributed residuals are normal and have equal variances between groups. We focus on the assumptions of normality and equality of variances, and argue that these assumptions are often unrealistic in the field of psychology. We underline the current lack of attention to these assumptions through an analysis of researchers’ practices. Through Monte Carlo simulations, we illustrate the consequences of performing the classic parametric F-test for ANOVA when the test assumptions are not met on the Type I error rate and statistical power. Under realistic deviations from the assumption of equal variances, the classic F-test can yield severely biased results and lead to invalid statistical inferences. We examine two common alternatives to the F-test, namely the Welch’s ANOVA (W-test) and the Brown-Forsythe test (F*-test). Our simulations show that under a range of realistic scenarios, the W-test is a better alternative and we therefore recommend using the W-test by default when comparing means. We provide a detailed example explaining how to perform the W-test in SPSS and R. We summarize our conclusions in practical recommendations that researchers can use to improve their statistical practices.
Kafkas Universitesi Veteriner Fakultesi Dergisi, 2009
We compared Analysis of Variance (F) and the Welch test (W) with their respective permutation versions (PF and PW) in terms of Type I error rate (α) and test power (1-β) by Monte Carlo simulation technique. Simulation results showed that when the variances were homogeneous, the permutation versions of F and W tests displayed more reliable results in terms of protecting Type I error rate at nominal level, regardless of distribution shape and sample size. Violation of homogeneity of variances adversely affected all tests. Regardless of sample size and effect size, the PF test was slightly more powerful compared to the F test as long as the variances were homogeneous and the distributions were skewed (χ 2 (3) and Exp [0.75]). The PF and F tests had similar power levels when the distributions were symmetrical (Beta (5.5)). The W test was more powerful with homogenous variances, while the PW test was slightly superior with heterogonous variances except for unbalanced sample sizes (i.e., 5:10:15).
On the Development of an Exponentiated F Test for One-way ANOVA in the Presence of Outlier(s)
Mathematics and Statistics, 2016
The classical Fisher-Snedecor test which compares several population means depends on the underlined assumptions which include; independent of populations, constant variance and absence of outlier among others .Arguably the source of violation of some of these assumptions is the outlier which lead to unequal variances. Outlier leads to inequality in the variances of the populations which consequently leads to the failure of the classical-F to take correct decision in terms of the null hypothesis. A series of robust tests have been carried out to ameliorate these lapses with some degrees of inaccuracies and limitations in terms of inflating the type 1 error and the power of different combination of parameters at various sample sizes while still uses the conventional F-table. This study focuses on developing robust F-test called exponentiated F test with the introduction of one shape parameter to the conventional F-distribution capable of taking decisions on ANOVA that are robust to the existence of outlier. The performance of the robust F test was compared with the existing F-tests in the literature using the power of test. Real life and simulated data were used to illustrate the applicability and efficiency of the proposed distribution over the existing ones. Experimental data with balanced and unbalanced design were used with populations sizes k=3 and k=5 were simulated with 10000 replications and varying degrees of outliers were ejected randomly. The results obtained indicate that the Proposed Exponentiated-F test is uniformly most powerful than the conventional-F tests for analysis of variance in the presence of outlier and is therefore recommended for use by researchers.
Understanding the Practical Advantages of Modern ANOVA Methods
Journal of Clinical Child & Adolescent Psychology, 2002
Examined the fundamental problems associated with standard hypothesis testing techniques. This article explains why many articles have failed to detect problems due to nonnormality and discusses the basics of modern methods aimed at correcting these problems.
Why Psychologists Should Always Report the W-test Instead of the F-Test ANOVA
2018
Student's t-test and classical F-test ANOVA rely on the assumptions that two or more samples are independent, and that independent and identically distributed residuals are normal and have equal variances between groups. We focus on the assumptions of normality and equality of variances, and argue that these assumptions are often unrealistic in the field of psychology. We underline the current lack of attention to these assumptions through an analysis of researchers' practices. Through Monte Carlo simulations we illustrate the consequences of performing the classic parametric F-test for ANOVA when the test assumptions are not met on the Type I error rate and statistical power. Under realistic deviations from the assumption of equal variances the classic F-test can yield severely biased results and lead to invalid statistical inferences. We examine two common alternatives to the F-test, namely the Welch's ANOVA (W-test) and the Brown-Forsythe test (F*-test). Our simulations show that under a range of realistic scenariosthe W-test is a better alternative and we therefore recommend using the W-test by default when comparing means. We provide a detailed example explaining how to perform the W-test in SPSS and R. We summarize our conclusions in practical recommendations that researchers can use to improve their statistical practices.
A SIMULATION STUDY ON TESTS FOR ONE WAY ANOVA UNDER THE UNEQUAL VARIANCE ASSUMPTION
communications.science.ankara.edu …
C o m m u n .Fa c .S c i.U n iv .A n k .S e rie s A 1 Vo lu m e 5 9 , N u m b e r 2 , P a g e s 1 5 -3 4 (2 0 1 0 ) IS S N 1 3 0 3 -5 9 9 1 Abstract. The classical F-test to compare several population means depends on the assumption of homogeneity of variance of the population and the normality. When these assumptions especially the equality of variance is dropped, the classical F-test fails to reject the null hypothesis even if the data actually provide strong evidence for it. This can be considered a serious problem in some applications, especially when the sample size is not large. To deal with this problem, a number of tests are available in the literature. In this study, the Brown-Forsythe, Weerahandi's Generalized F, Parametric Bootstrap, Scott-Smith, One-Stage, One-Stage Range, Welch and Xu-Wang's Generalized F-tests are introduced and a simulation study is performed to compare these tests according to type-1 errors and powers in di¤erent combinations of parameters and various sample sizes.