knearest neighbor algorithm using Sklearn Python (original) (raw)

Last Updated : 23 Mar, 2026

K-Nearest Neighbors (KNN) works by identifying the 'k' nearest data points called as neighbors to a given input and predicting its class or value based on the majority class or the average of its neighbors. In this article we will implement it using Python's Scikit-Learn library.

1. **Generating and Visualizing the 2D Data

from sklearn.datasets import make_moons import matplotlib.pyplot as plt import seaborn as sns import pandas as pd

Create synthetic 2D data

X, y = make_moons(n_samples=300, noise=0.3, random_state=42)

Create a DataFrame for plotting

df = pd.DataFrame(X, columns=["Feature 1", "Feature 2"]) df['Target'] = y

Visualize the 2D data

plt.figure(figsize=(8, 6)) sns.scatterplot(data=df, x="Feature 1", y="Feature 2", hue="Target", palette="Set1") plt.title("2D Classification Data (make_moons)") plt.grid(True) plt.show()

`

**Output:

2dd

2D Classification Data Visualisation

2. **Train-Test Split and Normalization

from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler

Split into train and test

X_train, X_test, y_train, y_test = train_test_split( X_scaled, y, test_size=0.3, random_state=42, stratify=y )

Normalize the features

scaler = StandardScaler() X_scaled = scaler.fit_transform(X)

`

3. **Fit the k-NN Model and Evaluate

from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import accuracy_score

Train a k-NN classifier

knn = KNeighborsClassifier(n_neighbors=5) knn.fit(X_train, y_train)

Predict and evaluate

y_pred = knn.predict(X_test) print(f"Test Accuracy (k=5): {accuracy_score(y_test, y_pred):.2f}")

`

**Output:

Test Accuracy (k=5): 0.87

4. **Cross-Validation to Choose Best k

Choosing the optimal k-value is critical before building the model for balancing the model's performance.

This code performs model selection for the k value in the k-NN algorithm using 5-fold cross-validation:

from sklearn.model_selection import cross_val_score import numpy as np

Range of k values to try

k_range = range(1, 21) cv_scores = []

Evaluate each k using 5-fold cross-validation

for k in k_range: knn = KNeighborsClassifier(n_neighbors=k) scores = cross_val_score(knn, X_scaled, y, cv=5, scoring='accuracy') cv_scores.append(scores.mean())

Plot accuracy vs. k

plt.figure(figsize=(8, 5)) plt.plot(k_range, cv_scores, marker='o') plt.title("k-NN Cross-Validation Accuracy vs k") plt.xlabel("Number of Neighbors: k") plt.ylabel("Cross-Validated Accuracy") plt.grid(True) plt.show()

Best k

best_k = k_range[np.argmax(cv_scores)] print(f"Best k from cross-validation: {best_k}")

`

**Output:

bestk

Choosing Best k

Best k from cross-validation: 6

5. **Training with Best k

Train final model with best k

best_knn = KNeighborsClassifier(n_neighbors=best_k) best_knn.fit(X_train, y_train)

Predict on test data

y_pred = best_knn.predict(X_test)

`

**6. Evaluate Using More Metrics

Print a classification report that includes:

from sklearn.metrics import confusion_matrix, classification_report, ConfusionMatrixDisplay

Confusion Matrix

cm = confusion_matrix(y_test, y_pred) disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=["Class 0", "Class 1"]) disp.plot(cmap="Blues") plt.title(f"Confusion Matrix (k={best_k})") plt.grid(False) plt.show()

Detailed classification report

print("Classification Report:") print(classification_report(y_test, y_pred, target_names=["Class 0", "Class 1"]))

`

**Output:

cm6

Confusion Matrix for k = 6

Screenshot2025-05-29155625

Classification Report

7. **Visualize Decision Boundary with Best k

This helps visualize how the model separates classes for the chosen value of k.

Python `

Create mesh grid

x_min, x_max = X_scaled[:, 0].min() - 1, X_scaled[:, 0].max() + 1 y_min, y_max = X_scaled[:, 1].min() - 1, X_scaled[:, 1].max() + 1

xx, yy = np.meshgrid( np.linspace(x_min, x_max, 300), np.linspace(y_min, y_max, 300) )

Predict on mesh grid with best k

Z = best_knn.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape)

Plot decision boundary

plt.figure(figsize=(8, 6)) plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.3) sns.scatterplot(x=X_scaled[:, 0], y=X_scaled[:, 1], hue=y, palette="Set1", edgecolor='k') plt.title(f"Decision Boundary with Best k = {best_k}") plt.xlabel("Feature 1 (scaled)") plt.ylabel("Feature 2 (scaled)") plt.grid(True) plt.show()

`

**Output:

db6

Decision Boundary with best K = 6

We can see that our KNN model is working fine in classifying datapoints.

You can download the complete code from here.