LOOCV (Leave One Out CrossValidation) in R Programming (original) (raw)

LOOCV (Leave One Out Cross-Validation) in R Programming

Last Updated : 15 Jul, 2025

LOOCV (Leave-One-Out Cross-Validation) is a model evaluation technique used to assess the performance of a machine learning model on small datasets. In LOOCV, one observation is used as the test set while the rest form the training set. This process is repeated for each data point in the dataset, resulting in n training-testing cycles, where n is the number of observations. The overall accuracy is averaged across all iterations.

Mathematical Expression

In Leave-One-Out Cross-Validation (LOOCV), each individual observation serves once as the validation set, while the remaining n-1 observations are used for training. Instead of refitting the model n times, LOOCV for linear models can be computed efficiently using the following formula:

LOOCV Error = \sum_{i=1}^{n} \left( \frac{y_i - \hat{y}_i}{1 - h_{ii}} \right)^2

**Where:

Implementation in R

We are going to perform a Leave-One-Out Cross Validation (LOOCV) on the Hedonic dataset to evaluate the performance of linear regression models with increasing polynomial degrees.

1. Importing the Dataset

We are loading the Ecdat package, which contains the Hedonic dataset with information on housing prices, and the boot package, which provides tools for resampling methods, including Leave-One-Out Cross Validation (LOOCV). We will then check the structure of the Hedonic dataset.

install.packages("Ecdat") install.packages("boot")

library(Ecdat) library(boot)

str(Hedonic)

`

**Output:

data_frame

Output

2. Fitting a Linear Model and Performing LOOCV

We are fitting a linear regression model to predict age and performing LOOCV to evaluate its performance.

age.glm <- glm(age ~ mv + crim + zn + indus + chas + nox + rm + tax + dis + rad + ptratio + blacks + lstat, data = Hedonic)

age.glm

`

**Output:

data

Output

3. Performing LOOCV and Extracting MSE

We are extracting the Mean Squared Error (MSE) from the LOOCV to evaluate model performance.

cv.mse <- cv.glm(Hedonic, age.glm) cat("Mean Squared error for the model is: ", cv.mse$delta)

`

**Output:

Mean Squared error for the model is: 250.2985 250.2856

4. Fitting Polynomial Models for Increasing Degrees and Performing LOOCV

We are fitting polynomial models with increasing degrees and performing LOOCV to evaluate their performance.

cv.mse = rep(0,5) for (i in 1:5) { age.loocv <- glm(age ~ mv + poly(crim, i) + zn + indus + chas + nox + rm + poly(tax, i) + dis + rad + ptratio + blacks + lstat, data = Hedonic)

cv.mse[i] = cv.glm(Hedonic, age.loocv)$delta[1] }

cat("Mean Squared error for the model is: ", cv.mse)

`

**Output:

Mean Squared error for the model is: 250.2985 252.4706 254.7776 299.5546 455.6091

Advantages of LOOCV

**Disadvantage of LOOCV : Training the model N times leads to expensive computation time if the dataset is large.