SciPy Statistical Significance Tests (original) (raw)

Statistical significance tests helps find out whether results we see in data are real or just happened by chance. They are used to check if differences or patterns in data are meaningful. SciPy’s scipy.stats module helps to perform these tests using simple Python functions. These tests are useful in research, experiments and data analysis.

Before running any test, it's essential to understand following key terms:

If p-value ≤ α -> reject H₀ (there is a significant effect)
If p-value > α -> fail to reject H₀ (no significant effect found)

Commonly Used Statistical Significance Tests

Some of widely used statistical significance tests are listed below. Each test serves a specific purpose in analyzing data and checking for meaningful patterns or differences.

1. Hypothesis Test

Hypothesis testing is a way to make decisions or predictions using data. It helps check if a certain assumption (called a hypothesis) about a population is likely to be true based on sample data.

**Example: This example tests whether average weight of a product is significantly different from 50 grams using sample data.

Python `

from scipy import stats

sample_weights = [49.2, 50.1, 50.3, 49.8, 50.5] hypothesized_mean = 50

Perform one-sample t-test

t_stat, p_value = stats.ttest_1samp(sample_weights, hypothesized_mean) print("T-statistic:", round(t_stat, 4)) print("P-value:", round(p_value, 4))

Significance level

alpha = 0.05 if p_value <= alpha: print("The null hypothesis is rejected. The average weight is significantly different from 50.") else: print("The null hypothesis is not rejected. No significant difference from 50.")

`

**Output

T-statistic: -0.0882
P-value: 0.9339
The null hypothesis is not rejected. No significant difference from 50.

**Explanation:

2. T-Test

T-test is a type of hypothesis test used to compare **averages (means) of two groups. It helps find out if difference between two group means is real or just happened by chance. The t-test works best when data is small and follows a shape similar to a normal distribution (called t-distribution).

**Example: A restaurant wants to test whether adding chipotle sauce to a dish has affected its average sales. Two sets of sales data are collected: one with chipotle and one without.

Python `

from scipy import stats

Sample sales data

sales_with_chipotle = [13.4, 10.9, 11.2, 11.8, 14, 15.3, 14.2, 12.6, 17, 16.2, 16.5, 15.7] sales_without_chipotle = [12, 11.7, 10.7, 11.2, 14.8, 14.4, 13.9, 13.7, 16.9, 16, 15.6, 16]

Perform the t-test

t_stat, p_value = stats.ttest_ind(sales_with_chipotle, sales_without_chipotle) print("T-statistic:", round(t_stat, 4)) print("P-value:", round(p_value, 4))

Significance level

alpha = 0.05 if p_value <= alpha: print("The null hypothesis is rejected. Chipotle sauce has a significant effect on sales.") else: print("The null hypothesis is not rejected. Chipotle sauce does not have a significant effect on sales.")

`

**Output

T-statistic: 0.1846
P-value: 0.8552
The null hypothesis is not rejected. Chipotle sauce does not have a significant effect on sales.

**Explanation:

3. Kolmogorov Smirnoff Test

Kolmogorov–Smirnov test is a statistical test used to check whether a dataset follows a specific distribution. It is often used to test if data is normally distributed or uniformly distributed or to compare two distributions.

**Note: This test is valid only for continuous distributions.

**Example: In this example, a dataset of 1000 values is randomly generated from a uniform distribution. The one sample KS test is used to check whether data really follows a uniform distribution.

Python `

import numpy as np from scipy.stats import kstest

Generate 1000 values from a uniform distribution

v = np.random.uniform(size=1000) res = kstest(v, 'uniform') # Kolmogorov–Smirnov test

Print test statistic and p-value

print("K–S Test Statistic:", round(res.statistic, 4)) print("P-value:", round(res.pvalue, 4))

Set significance level

alpha = 0.05 if res.pvalue > alpha: print("The null hypothesis is not rejected.") print("Conclusion: The data likely follows a uniform distribution.") else: print("The null hypothesis is rejected.") print("Conclusion: The data does not follow a uniform distribution.")

`

**Output

K–S Test Statistic: 0.0277
P-value: 0.4194
The null hypothesis is not rejected.
Conclusion: The data likely follows a uniform distribution.

**Explanation:

4. Normality Tests (Skewness and Kurtosis)

Normality tests help check whether data follows a normal (bell-shaped) distribution. Two common indicators are:

**Skewness: Measures symmetry of the distribution.

**Kurtosis: Measures how heavy or light the tails are compared to normal.

**Example: This example checks if the given data is normally distributed by calculating skewness and kurtosis using scipy.stats.describe().

Python `

import numpy as np from scipy.stats import describe

Generate 100 values from a normal distribution

v = np.random.normal(size=100) result = describe(v)

print("Skewness:", round(result.skewness, 4)) print("Kurtosis:", round(result.kurtosis + 3, 4)) # Adjusted to match standard kurtosis

`