Multiclass classification using scikitlearn (original) (raw)

Multiclass classification using scikit-learn

Last Updated : 13 Aug, 2025

Multiclass classification is a supervised machine learning task in which each data instance is assigned to one class from three or more possible categories. In scikit-learn, implementing multiclass classification involves preparing the dataset, selecting the appropriate algorithm, training the model and evaluating its performance. Common multiclass classifiers include Decision Tree, Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Naive Bayes, each offering a different approach for handling multiple class labels within the data. Real-world examples include digit recognition, species identification and product categorization.

Step-by-Step Implementation

Let's see the step-by-step implementation of Multiclass Classification along with various classifiers,

Step 1: Import Libraries

We will import the required libraries,

from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, confusion_matrix import matplotlib.pyplot as plt import seaborn as sns

`

Step 2: Load and Explore the Dataset

The Iris dataset is a famous collection of 150 flower samples, representing three Iris species, setosa, versicolor and virginica. Each sample has four numeric features: sepal length, sepal width, petal length and petal width.

Step 3: Split the Data

We will split the data for training and testing,

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=0)

`

Step 4: Model Training and Visualization

**1. Decision Tree Classifier: Decision Tree Classifier is a model that predicts class labels by learning simple decision rules arranged in a tree structure, where each node makes a decision based on a feature until a class label is assigned at the leaf.

from sklearn.tree import DecisionTreeClassifier

dtree = DecisionTreeClassifier(max_depth=2, random_state=0) dtree.fit(X_train, y_train) dtree_preds = dtree.predict(X_test) dtree_acc = accuracy_score(y_test, dtree_preds) dtree_cm = confusion_matrix(y_test, dtree_preds)

print("Decision Tree Accuracy:", dtree_acc)

plt.figure(figsize=(4, 3)) sns.heatmap(dtree_cm, annot=True, cmap="Blues", fmt="d") plt.title("Decision Tree Confusion Matrix") plt.xlabel("Predicted") plt.ylabel("Actual") plt.show()

`

**Output:

Decision Tree Accuracy: 0.9111111111111111

decision-tree

Decision Tree Classifier Confusion Matrix

**2. Support Vector Machine(SVM) Classifier: Support Vector Machine Classifier is a model that separates data into classes by finding the optimal hyperplane that maximizes the margin between different class groups in the feature space.

from sklearn.svm import SVC

svm = SVC(kernel='linear', C=1, random_state=0) svm.fit(X_train, y_train) svm_preds = svm.predict(X_test) svm_acc = accuracy_score(y_test, svm_preds) svm_cm = confusion_matrix(y_test, svm_preds)

print("SVM Accuracy:", svm_acc)

plt.figure(figsize=(4, 3)) sns.heatmap(svm_cm, annot=True, cmap="Blues", fmt="d") plt.title("SVM Confusion Matrix") plt.xlabel("Predicted") plt.ylabel("Actual") plt.show()

`

**Output:

SVM Accuracy: 0.9777777777777777

svm

SVM Confusion Matrix

**3. K-Nearest Neighbors(KNN) Classifiers: k-Nearest Neighbors Classifier is a model that classifies a data point by looking at the majority class among its k-nearest neighbors, based on distance in feature space.

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=7) knn.fit(X_train, y_train) knn_preds = knn.predict(X_test) knn_acc = accuracy_score(y_test, knn_preds) knn_cm = confusion_matrix(y_test, knn_preds)

print("KNN Accuracy:", knn_acc)

plt.figure(figsize=(4, 3)) sns.heatmap(knn_cm, annot=True, cmap="Blues", fmt="d") plt.title("KNN Confusion Matrix") plt.xlabel("Predicted") plt.ylabel("Actual") plt.show()

`

**Output:

KNN Accuracy: 0.9777777777777777

knn

KNN Confusion Matrix

**4. Naive Bayes Classifier: Naive Bayes Classifier is a probabilistic model based on Bayes' theorem, which assumes that features are independent given the class and predicts the most probable class for new data.

from sklearn.naive_bayes import GaussianNB

nb = GaussianNB() nb.fit(X_train, y_train) nb_preds = nb.predict(X_test) nb_acc = accuracy_score(y_test, nb_preds) nb_cm = confusion_matrix(y_test, nb_preds)

print("Naive Bayes Accuracy:", nb_acc)

plt.figure(figsize=(4, 3)) sns.heatmap(nb_cm, annot=True, cmap="Blues", fmt="d") plt.title("Naive Bayes Confusion Matrix") plt.xlabel("Predicted") plt.ylabel("Actual") plt.show()

`

**Output:

Naive Bayes Accuracy: 1.0

naive

Naive Bayes Confusion Matrix