Linear Discriminant Analysis in R Programming (original) (raw)

Last Updated : 14 Apr, 2026

Linear Discriminant Analysis (LDA) is a statistical method that is also widely used in machine learning for classification and dimensionality reduction. It works by finding a line (or plane in higher dimensions) that best separates the classes in the data by creating linear combinations of input features to maximize the distance between classes and minimize variation within the same class.

Assumptions of Linear Discriminant Analysis (LDA)

Where is LDA Used?

LDA is widely applied in various real-world scenarios, including:

Implementation of Linear Discriminant Analysis (LDA)

We implement Linear Discriminant Analysis using the lda() function from the MASS package on the Iris dataset and visualize class separation with synthetic data.

1. Installing and Loading Required Packages

We install and load all required packages to preprocess data, build the model and visualize results.

install.packages("MASS") install.packages("tidyverse") install.packages("caret") install.packages("mvtnorm")

library(MASS) library(tidyverse) library(caret) library(mvtnorm) library(ggplot2)

`

2. Loading and Splitting the Dataset

We load the Iris dataset and divide it into training and test sets.

data("iris") set.seed(123) training.individuals <- iris$Species %>% createDataPartition(p = 0.8, list = FALSE) train.data <- iris[training.individuals, ] test.data <- iris[-training.individuals, ]

`

3. Normalizing the Dataset

Normalization is not strictly required for LDA, but it can help when features are on very different scales.

preproc.parameter <- train.data %>% preProcess(method = c("center", "scale")) train.transform <- preproc.parameter %>% predict(train.data) test.transform <- preproc.parameter %>% predict(test.data)

`

4. Fitting the LDA Model

We train the LDA model using the transformed training dataset.

model <- lda(Species ~ ., data = train.transform)

`

5. Making Predictions

We use the model to predict species on the test set.

predictions <- model %>% predict(test.transform)

`

6. Checking Model Accuracy

We calculate how many predictions matched the actual labels.

mean(predictions$class == test.transform$Species)

`

**Output:

0.966666666666667

7. Viewing Model Details

We print the model output including prior probabilities, group means and coefficients.

model

`

**Output:

dataset

Output

8. Plotting the Output

We generate synthetic Gaussian samples to visualize how well LDA can separate two classes.

var_covar <- matrix(data = c(1.5, 0.4, 0.4, 1.5), nrow = 2) Xplus1 <- rmvnorm(400, mean = c(5, 5), sigma = var_covar) Xminus1 <- rmvnorm(600, mean = c(3, 3), sigma = var_covar) Y_samples <- c(rep(1, 400), rep(-1, 600)) dataset <- as.data.frame(cbind(rbind(Xplus1, Xminus1), Y_samples)) colnames(dataset) <- c("X1", "X2", "Y") dataset$Y <- as.character(dataset$Y)

ggplot(data = dataset) + geom_point(aes(X1, X2, color = Y))

`

**Output:

graph-output

The output is a scatter plot showing two distinct classes of synthetic data points, where each class forms a cluster, helping visualize how LDA can separate them based on feature values.