SciPy stats.describe() Function (original) (raw)
scipy.stats.describe() function compute a variety of descriptive statistics for a dataset, including count, min/max, mean, variance, skewness and kurtosis. It's a convenient one-stop summary tool for numerical data analysis.
**Example:
Python `
from scipy.stats import describe import numpy as np d = [4, 8, 6, 5, 3, 7, 9]
res = describe(d) print(res)
`
**Output
DescribeResult(nobs=np.int64(7), minmax=(np.int64(3), np.int64(9)), mean=np.float64(6.0), variance=np.float64(4.666666666666667), skewness=np.float64(0.0), kurtosis=np.float64(-1.25))
**Explanation: The data is symmetric (skewness=0) and flatter than a normal distribution (kurtosis < 0).
Syntax
scipy.stats.describe(a, axis=0, ddof=1, bias=True, nan_policy='propagate')
**Parameters:
- **a (array_like): Input data (1D or 2D array).
- **axis (int or None): Axis for computation; None computes over all data.
- **ddof (int): Degrees of freedom for variance (default = 1).
- **bias (bool): If False, corrects skewness/kurtosis bias.
- **nan_policy (str): Handle NaNs – 'propagate', 'raise' or 'omit'.
**Returns: A DescribeResult namedtuple with fields: nobs (number of observations), minmax (tuple of min and max), mean, variance (sample), skewness (asymmetry) and kurtosis (Fisher’s definition; 0 for a normal distribution).
Examples
**Example 1: NumPy Array with axis=None
Python `
import numpy as np from scipy.stats import describe
d = [1, 2, np.nan, 4, 5]
res = describe(d, nan_policy='omit') print(res)
`
Explanation: Ignores the NaN and computes stats for [1, 2, 4, 5]. Returns count, mean, and variance without error.
**Example 2: 2D Array with axis=0 (Column-wise Stats)
Python `
import numpy as np from scipy.stats import describe
d = np.array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])
res = describe(d, axis=0) print(res)
`
**Output
DescribeResult(nobs=np.int64(3), minmax=(array([1, 2, 3]), array([7, 8, 9])), mean=array([4., 5., 6.]), variance=array([9., 9., 9.]), skewness=array([0., 0., 0.]), kurtosis=array([-1.5, -1.5, -1.5]))
**Explanation: Calculates stats column-wise. Each column has 3 values and equal spread, so results are identical across columns.
**Example 3: Handling NaN with nan_policy='omit'
Python `
import numpy as np from scipy.stats import describe
d = [1, 2, np.nan, 4, 5]
res = describe(d, nan_policy='omit') print(res)
`
**Output
DescribeResult(nobs=np.int64(4), minmax=(masked_array(data=1.,
mask=False,
fill_value=1e+20), masked_array(data=5.,
mask=False,
fill_value=1e+20)), mean=np.float64(3.0), variance=np.float64(3.3333333333333335), skewness=masked_array(data=0.,
mask=False,
fill_value=1e+20), kurtosis=np.float64(-1.64))
**Explanation: Output may appear as masked arrays when ignoring NaNs, but results are still valid and usable for analysis.