Sample Variance (original) (raw)

Last Updated : 23 Jul, 2025

In statistics, sample variance tells us how spread out the data points are from the average (mean) within a sample. Sample variance computes the mean of the squared differences of every data point with the mean. This proves to be useful if you have a small population (sample) from a greater number (population) since this reveals how diverse the data in the sample happens to be.

Mathematical Definition of Sample Variance

The formula for the sample variance is given by:

s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}

**Where:

n = Number of observations in the sample
xi = Each individual observation
\bar{x} = Sample mean, calculated as:

Bias in Estimating Variance

When calculating variance for a sample, using n - 1 instead of n compensates for the bias that arises from using sample data to estimate the population variance. This correction, known as Bessel's correction, makes the sample variance an unbiased estimator.

Why Use n - 1 in Sample Variance?

When calculating sample variance, we divide by n -1 instead of n to account for Bessel's correction. This correction corrects the bias in the estimation of the population variance by using a sample. Dividing by n -1 gives an unbiased estimate of the population variance, ensuring that the sample variance is not underestimated.

Properties of Sample Variance

**Non-negative: Since it involves squaring deviations, the variance is always non-negative.
**Zero Variance: A variance of zero indicates that all data points are identical.
**Units: The variance is expressed in squared units of the original data, which is why the standard deviation is often used as a more interpretable measure.

Difference Between Sample Variance and Population Variance

Feature	Sample Variance	Population Variance
Denominator	n - 1	N
Purpose	Estimate population variance from a sample	Exact variance for the entire population
Bias	Unbiased due to Bessel’s correction	No correction needed

Sample Variance in Python

Python `

import numpy as np data = [4, 8, 6, 5, 10]

Calculating sample variance

sample_variance = np.var(data, ddof=1) print(f"Sample Variance: {sample_variance:.2f}")

**Output:

Sample Variance: 5.80

Limitations of Sample Variance

**Sensitive to Outliers: A few extreme values can significantly affect the variance.
**Not Intuitive: The units are in squared terms, making interpretation difficult without taking the square root (standard deviation).