Non Parametric Methods in Statistics (original) (raw)

Non-parametric methods in statistics are techniques that do not assume a specific probability distribution for the data. Unlike parametric methods, which rely on fixed parameters (e.g., mean, variance), non-parametric methods are more flexible and useful when dealing with unknown or complex distributions. These methods are widely applied in hypothesis testing, regression, density estimation and classification.

Common Non-Parametric Statistical Tests

Wilcoxon Rank-Sum Test (Mann-Whitney U Test)

Used to compare two independent groups when normality assumptions do not hold.

U = n_1 n_2 + \frac{n_1 (n_1 + 1)}{2} - R_1

**where:

from scipy.stats import mannwhitneyu x = [3, 5, 7, 9] y = [2, 4, 6, 8] stat, p = mannwhitneyu(x, y) print("Mann-Whitney U test statistic:", stat, "p-value:", p)

`

**Output

Mann-Whitney U test statistic: 10.0 p-value: 0.6857142857142857

Kruskal-Wallis Test

A non-parametric alternative to ANOVA for comparing more than two groups.

H = \frac{12}{N(N+1)} \sum \frac{R_i^2}{n_i} - 3(N+1)

where:

from scipy.stats import kruskal stat, p = kruskal([1, 2, 3], [4, 5, 6], [7, 8, 9]) print("Kruskal-Wallis test statistic:", stat, "p-value:", p)

`

**Output

Kruskal-Wallis test statistic: 7.200000000000003 p-value: 0.02732372244729252

Non-Parametric Regression

1. Kernel Density Estimation (KDE)

KDE is a technique to estimate the probability density function (PDF) of a dataset.

\hat{f}(x) = \frac{1}{nh} \sum_{i=1}^{n} K \left( \frac{x - x_i}{h} \right)

**where:

import numpy as np import seaborn as sns import matplotlib.pyplot as plt

data = np.random.randn(100) sns.kdeplot(data, bw_adjust=0.5) plt.show()

`

**Output

Density

2. k-Nearest Neighbors (k-NN) Regression

k-NN is a simple, non-parametric regression method that predicts the target variable based on the mean (or median) of the nearest k neighbors.

\hat{y} = \frac{1}{k} \sum_{i=1}^{k} y_i

where yi are the values of the k nearest neighbors.

Implementation of K-Nearest Neighbors Regression

Python `

from sklearn.neighbors import KNeighborsRegressor X = np.array([[1], [2], [3], [4], [5]]) y = np.array([2, 4, 6, 8, 10]) knn = KNeighborsRegressor(n_neighbors=2) knn.fit(X, y) print(knn.predict([[3.5]]))

`

**Output

[7.]

3. Bootstrap Methods

Bootstrap methods are resampling techniques used to estimate the sampling distribution of a statistic.

Algorithm:

from sklearn.utils import resample import numpy as np

sample = np.array([3, 5, 7, 9, 11]) bootstrap_samples = [resample(sample, replace=True, n_samples=len(sample)) for _ in range(1000)] bootstrap_means = [np.mean(s) for s in bootstrap_samples] print("Bootstrap Mean Estimate:", np.mean(bootstrap_means))

`

**Output

Bootstrap Mean Estimate: 6.9883999999999995

Advantages

Disadvantages