Classifying data using Support Vector Machines(SVMs) in Python (original) (raw)

Last Updated : 2 Aug, 2025

Support Vector Machines (SVMs) are supervised learning algorithms widely used for classification and regression tasks. They can handle both linear and non-linear datasets by identifying the optimal decision boundary (hyperplane) that separates classes with the maximum margin. This improves generalization and reduces misclassification.

Core Concepts

Optimization Objective

SVMs solve a constrained optimization problem with two main goals:

  1. Maximize the margin between classes for better generalization.
  2. Minimize classification errors on the training data, controlled by the parameter C.

The Kernel Trick

Real-world data is rarely linearly separable. The kernel trick elegantly solves this by implicitly mapping data into higher-dimensional spaces where linear separation becomes possible, without explicitly computing the transformation.

Common Kernel Functions

Implementing SVM Classification in Python

1. Importing Required Libraries

We will import required python libraries

import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.svm import SVC from sklearn.metrics import accuracy_score, classification_report

`

2. Loading the Dataset

We will load the dataset and select only two features for visualization:

data = load_breast_cancer() X = data.data[:, [0, 1]] y = data.target

`

3. Splitting the Data

We will split the dataset into training and test sets:

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 )

`

4. Scale the Features

We will scale the features so that they are standardized:

scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test)

`

5. Train the SVM Classifier

We will train the Support Vector Classifier:

svm_classifier = SVC(kernel='linear', C=1.0, random_state=42) svm_classifier.fit(X_train_scaled, y_train)

`

6. Evaluate the Model

We will predict labels and evaluate model performance:

y_pred = svm_classifier.predict(X_test_scaled) print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}") print(classification_report(y_test, y_pred, target_names=data.target_names))

`

**Output:

SVM

SVM - output

Visualizing the Decision Boundary

We will plot the decision boundary for the trained SVM model:

def plot_decision_boundary(X, y, model, scaler): h = 0.02 # Step size for mesh x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# Predict on mesh points
Z = model.predict(scaler.transform(np.c_[xx.ravel(), yy.ravel()]))
Z = Z.reshape(xx.shape)

# Plot decision boundary and data points
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.3)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, edgecolors='k')
plt.xlabel(data.feature_names[0])
plt.ylabel(data.feature_names[1])
plt.title('SVM Decision Boundary')
plt.show()

plot_decision_boundary(X_train, y_train, svm_classifier, scaler)

`

**Output:

SVM-decision-boundary

SVM decision boundary

Why Use SVMs

SVMs work best when the data has clear margins of separation, when the feature space is high-dimensional (such as text or image classification) and when datasets are moderate in size so that quadratic optimization remains feasible.

Advantages

Limitations

Support Vector Machines are a robust choice for classification, especially when classes are well-separated. By maximizing the margin around the decision boundary, they deliver strong generalization performance across diverse datasets.

Performance Optimization Tips

For Large Datasets

Memory Management

Preprocessing Best Practices