OneWay ANOVA (original) (raw)

One-Way ANOVA

Last Updated : 23 Jan, 2026

Analysis of Variance (ANOVA) is a parametric statistical method used to determine whether there is a significant difference among the means of three or more groups by testing the null hypothesis that all group means are equal.

one_way_anova

One Way Anova

One-way ANOVA is the simplest form of ANOVA used when a single independent variable has three or more groups. It determines whether there are statistically significant differences among the group means by comparing within-group and between-group variation.

One-way ANOVA analyzes the effect of a single factor on multiple independent groups.
It is commonly used due to its simplicity and efficiency.
The test indicates whether at least two group means differ, but does not identify which ones.
It is mainly applied when three or more groups are compared.
The relationship between one-way ANOVA and the t-test is given by F=t^{2}

Assumptions of ANOVA

The dependent variable is approximately normally distributed within each group, especially important for small sample sizes.
Observations are randomly selected and independent of one another.
All groups have equal variances (homogeneity of variance).
Each data point belongs to only one group with no overlap.
For two-way ANOVA, the effects of independent variables are additive and there is no significant interaction between them.

When to Use One-Way ANOVA

A one-way Analysis of Variance (ANOVA) is used when you want to examine the effect of a single categorical independent variable on a quantitative dependent variable. The independent variable must consist of at least three distinct levels or groups.

One-way ANOVA determines whether the mean of the dependent variable differs significantly across the levels of the independent variable. Typical examples include:

Website design version (Design A, Design B, Design C) as the independent variable and user engagement time as the dependent variable.
Machine learning model type (Logistic Regression, SVM, Random Forest) as the independent variable and classification accuracy as the dependent variable.
Social media usage level (low, medium, high) as the independent variable and average hours of sleep per night as the dependent variable.

The null hypothesis (H_{0}) states that all group means are equal, indicating no effect of the independent variable. The alternative hypothesis (H_{a}) states that at least one group mean differs significantly from the others.

How to Perform One-Way ANOVA

One-way ANOVA is a hypothesis test used to determine whether the means of three or more groups differ significantly based on a single factor. The test statistic used is the F-statistic which compares between-group variance to within-group variance.

Step 1: Define Hypotheses

*Null hypothesis(H_{0}*): All group population means are equal

\mu_1 = \mu_2 = \mu_3 = \dots = \mu_k

**Alternative hypothesis (H_{a}): At least one group mean differs.

This step clarifies what you are testing and what outcome would lead you to reject the null.

Step 2: Compute Degrees of Freedom

Degrees of freedom (df) help determine the critical F-value from statistical tables.

Between groups: df_{\text{between}} = k - 1

Within groups: df_{within} = n - k

df_{total} = df_{between} + df_{within}

where

k: number of groups
n: total number of observations

Step 3: Understand the F-Statistic

The F-statistic is the ratio of variability between groups to variability within groups:

F = \frac{\text{Variance between groups}}{\text{Variance within groups}}

A larger F-value indicates that group means differ more than expected by chance.

Step 4: Calculate Group Means and Grand Mean

Compute the mean of each group. Then calculate the grand mean across all observations:

\mu_{grand} = \frac{\sum G}{n}

where

{\sum G} is the sum of all observations
{n} is the total sample size Python `

import numpy as np

team_A = [50, 47, 52, 46, 51, 48, 49, 47, 50] team_B = [40, 42, 38, 41, 39, 40, 41, 39, 40] team_C = [55, 54, 57, 53, 56, 55, 55, 54, 57]

data = [team_A, team_B, team_C]

group_means = [np.mean(team) for team in data] overall_mean = np.mean([x for team in data for x in team])

print("Group Means:", dict(zip(['Team A','Team B','Team C'], group_means))) print("Overall Mean:", round(overall_mean, 2))

**Output:

Group Means: {'Team A': np.float64(48.888888888888886), 'Team B': np.float64(40.0), 'Team C': np.float64(55.111111111111114)}

Overall Mean: 48.0

Step 5: Compute Sum of Squares

Measure variability using sum of squares (SS)

**Total Sum of Squares:

SS_{total} = \sum (x_i - \mu_{grand})^2

**Within-Group Sum of Squares:

SS_{within} = \sum (x_i - \mu_i)^2

**Between-Group Sum of Squares:

SS_{between} = SS_{total} - SS_{within}

This separates variability due to group differences from random error.

Python `

SS_between = sum([len(team)*(np.mean(team)-overall_mean)**2 for team in data]) SS_within = sum([sum((x-np.mean(team))**2 for x in team) for team in data])

Step 6: Compute Mean Squares

Convert sums of squares into mean squares by dividing by their respective degrees of freedom

MS_{between} = \frac{SS_{between}}{df_{between}}

MS_{within} = \frac{SS_{within}}{df_{within}}

Python `

k = len(data) n = sum(len(team) for team in data)

df_between = k - 1 df_within = n - k

MS_between = SS_between / df_between MS_within = SS_within / df_within

Step 7: Calculate the F-Statistic

F_{calc} = \frac{MS_{between}}{MS_{within}}

This is the test statistic used to compare against the critical F-value.

Python `

F_stat = MS_between / MS_within from scipy import stats

p_value = 1 - stats.f.cdf(F_stat, df_between, df_within) print(f"F-statistic: {F_stat:.4f}")

**Output:

F-statistic: 208.4164

Step 8: Test Assumptions

**Normality: Each group should be normally distributed.
**Equal variance: Variances across groups should be similar. Python `

for i, team in enumerate(data, start=1): stat, p = stats.shapiro(team) print(f"Team {chr(64+i)} p-value: {p:.4f}")

stat, p = stats.levene(*data) print(f"Levene's Test p-value: {p:.4f}")

**Output:

Team A p-value: 0.7796

Team B p-value: 0.8299

Team C p-value: 0.4944

Levene's Test p-value: 0.1541

Step 9: Making the Statistical Decision

After computing the F-statistic, decide whether to reject or fail to reject H_{0}

**1. Using the F-Table

Compare the calculated F-value (F_{calc}) with the critical F-value from the F-distribution table (F_{table}) at the chosen significance level (\alpha):

if F_{calc} < F_{table}: Do not reject H_{0} all group means are equal
if F_{calc} > F_{table} : Reject H_{0} at least one group mean is significantly different

**2. Using the p-value

Compare p-value with significance level \alpha:

Python `

alpha = 0.05 if p_value < alpha: print("Reject H0: At least one team mean is significantly different") else: print("Fail to reject H0: No significant difference between team means")

**Output:

Reject H0: At least one team mean is significantly different

You can download full code from here

Advantages

Can test multiple groups simultaneously, unlike t-tests which are limited to two groups.
Reduces the Type I error that occurs when performing multiple t-tests.
Easy to implement and interpret with statistical software.
Provides a quantitative measure (F-statistic) to evaluate group differences.

Limitations

Assumes normality, homogeneity of variances and independence of observations.
Only identifies that a difference exists, not which specific groups differ (requires post-hoc tests).
Sensitive to outliers, which can distort results.
Cannot handle more than one independent variable; for that, two-way ANOVA is needed.