Sample Variance (original) (raw)
Last Updated : 23 Jul, 2025
In statistics, sample variance tells us how spread out the data points are from the average (mean) within a sample. Sample variance computes the mean of the squared differences of every data point with the mean. This proves to be useful if you have a small population (sample) from a greater number (population) since this reveals how diverse the data in the sample happens to be.
Mathematical Definition of Sample Variance
The formula for the sample variance is given by:
s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}
**Where:
- n = Number of observations in the sample
- xi = Each individual observation
- \bar{x} = Sample mean, calculated as:
Bias in Estimating Variance
When calculating variance for a sample, using n - 1 instead of n compensates for the bias that arises from using sample data to estimate the population variance. This correction, known as Bessel's correction, makes the sample variance an unbiased estimator.
Why Use n - 1 in Sample Variance?
When calculating sample variance, we divide by n -1 instead of n to account for Bessel's correction. This correction corrects the bias in the estimation of the population variance by using a sample. Dividing by n -1 gives an unbiased estimate of the population variance, ensuring that the sample variance is not underestimated.
Properties of Sample Variance
- **Non-negative: Since it involves squaring deviations, the variance is always non-negative.
- **Zero Variance: A variance of zero indicates that all data points are identical.
- **Units: The variance is expressed in squared units of the original data, which is why the standard deviation is often used as a more interpretable measure.
Difference Between Sample Variance and Population Variance
| Feature | Sample Variance | Population Variance |
|---|---|---|
| Denominator | n - 1 | N |
| Purpose | Estimate population variance from a sample | Exact variance for the entire population |
| Bias | Unbiased due to Bessel’s correction | No correction needed |
Sample Variance in Python
Python `
import numpy as np data = [4, 8, 6, 5, 10]
Calculating sample variance
sample_variance = np.var(data, ddof=1) print(f"Sample Variance: {sample_variance:.2f}")
`
**Output:
Sample Variance: 5.80
Limitations of Sample Variance
- **Sensitive to Outliers: A few extreme values can significantly affect the variance.
- **Not Intuitive: The units are in squared terms, making interpretation difficult without taking the square root (standard deviation).