OneWay ANOVA (original) (raw)

One-Way ANOVA

Last Updated : 23 Jan, 2026

Analysis of Variance (ANOVA) is a parametric statistical method used to determine whether there is a significant difference among the means of three or more groups by testing the null hypothesis that all group means are equal.

one_way_anova

One Way Anova

One-way ANOVA is the simplest form of ANOVA used when a single independent variable has three or more groups. It determines whether there are statistically significant differences among the group means by comparing within-group and between-group variation.

Assumptions of ANOVA

When to Use One-Way ANOVA

A one-way Analysis of Variance (ANOVA) is used when you want to examine the effect of a single categorical independent variable on a quantitative dependent variable. The independent variable must consist of at least three distinct levels or groups.

One-way ANOVA determines whether the mean of the dependent variable differs significantly across the levels of the independent variable. Typical examples include:

The null hypothesis (H_{0}) states that all group means are equal, indicating no effect of the independent variable. The alternative hypothesis (H_{a}) states that at least one group mean differs significantly from the others.

How to Perform One-Way ANOVA

One-way ANOVA is a hypothesis test used to determine whether the means of three or more groups differ significantly based on a single factor. The test statistic used is the F-statistic which compares between-group variance to within-group variance.

Step 1: Define Hypotheses

\mu_1 = \mu_2 = \mu_3 = \dots = \mu_k

This step clarifies what you are testing and what outcome would lead you to reject the null.

Step 2: Compute Degrees of Freedom

Degrees of freedom (df) help determine the critical F-value from statistical tables.

Between groups: df_{\text{between}} = k - 1

Within groups: df_{within} = n - k

df_{total} = df_{between} + df_{within}

where

Step 3: Understand the F-Statistic

The F-statistic is the ratio of variability between groups to variability within groups:

F = \frac{\text{Variance between groups}}{\text{Variance within groups}}

A larger F-value indicates that group means differ more than expected by chance.

Step 4: Calculate Group Means and Grand Mean

Compute the mean of each group. Then calculate the grand mean across all observations:

\mu_{grand} = \frac{\sum G}{n}

where

import numpy as np

team_A = [50, 47, 52, 46, 51, 48, 49, 47, 50] team_B = [40, 42, 38, 41, 39, 40, 41, 39, 40] team_C = [55, 54, 57, 53, 56, 55, 55, 54, 57]

data = [team_A, team_B, team_C]

group_means = [np.mean(team) for team in data] overall_mean = np.mean([x for team in data for x in team])

print("Group Means:", dict(zip(['Team A','Team B','Team C'], group_means))) print("Overall Mean:", round(overall_mean, 2))

`

**Output:

Group Means: {'Team A': np.float64(48.888888888888886), 'Team B': np.float64(40.0), 'Team C': np.float64(55.111111111111114)}

Overall Mean: 48.0

Step 5: Compute Sum of Squares

Measure variability using sum of squares (SS)

**Total Sum of Squares:

SS_{total} = \sum (x_i - \mu_{grand})^2

**Within-Group Sum of Squares:

SS_{within} = \sum (x_i - \mu_i)^2

**Between-Group Sum of Squares:

SS_{between} = SS_{total} - SS_{within}

This separates variability due to group differences from random error.

Python `

SS_between = sum([len(team)*(np.mean(team)-overall_mean)**2 for team in data]) SS_within = sum([sum((x-np.mean(team))**2 for x in team) for team in data])

`

Step 6: Compute Mean Squares

Convert sums of squares into mean squares by dividing by their respective degrees of freedom

MS_{between} = \frac{SS_{between}}{df_{between}}

MS_{within} = \frac{SS_{within}}{df_{within}}

Python `

k = len(data) n = sum(len(team) for team in data)

df_between = k - 1 df_within = n - k

MS_between = SS_between / df_between MS_within = SS_within / df_within

`

Step 7: Calculate the F-Statistic

F_{calc} = \frac{MS_{between}}{MS_{within}}

This is the test statistic used to compare against the critical F-value.

Python `

F_stat = MS_between / MS_within from scipy import stats

p_value = 1 - stats.f.cdf(F_stat, df_between, df_within) print(f"F-statistic: {F_stat:.4f}")

`

**Output:

F-statistic: 208.4164

Step 8: Test Assumptions

for i, team in enumerate(data, start=1): stat, p = stats.shapiro(team) print(f"Team {chr(64+i)} p-value: {p:.4f}")

stat, p = stats.levene(*data) print(f"Levene's Test p-value: {p:.4f}")

`

**Output:

Team A p-value: 0.7796

Team B p-value: 0.8299

Team C p-value: 0.4944

Levene's Test p-value: 0.1541

Step 9: Making the Statistical Decision

After computing the F-statistic, decide whether to reject or fail to reject H_{0}

**1. Using the F-Table

Compare the calculated F-value (F_{calc}​) with the critical F-value from the F-distribution table (F_{table}) at the chosen significance level (\alpha):

**2. Using the p-value

Compare p-value with significance level \alpha:

Python `

alpha = 0.05 if p_value < alpha: print("Reject H0: At least one team mean is significantly different") else: print("Fail to reject H0: No significant difference between team means")

`

**Output:

Reject H0: At least one team mean is significantly different

You can download full code from here

Advantages

Limitations