Multivariate Regression (original) (raw)

Last Updated : 24 Mar, 2026

Multivariate Regression is a technique used when we need to predict more than one output variable at the same time. Instead of building separate models for each target, a single model learns how input features are connected to multiple outputs together. This is especially useful when the outputs are related to each other and may influence one another.

multivariate_regression

Multivariate Regression

For example, using study hours and attendance to predict both Math and Science marks at the same time.

Architecture

To clearly understand multivariate regression, let’s first recall simple linear regression. In simple linear regression, we predict one output using one input. But in multivariate regression, we predict multiple outputs together using input features. Instead of a single equation, we use matrix form to handle multiple targets at once.

The mathematical form of multivariate regression is:

Y = XB + \epsilon

This equation looks compact, but it represents predicting multiple outputs together in one structured computation. Here:

Working of Multivariate Regression

Multivariate regression works in a structured sequence. First, we organize the data into matrices. Then we compute the coefficient matrix using a mathematical formula. Finally, we use those coefficients to generate predictions for all output variables at once.

Step 1: Prepare Input and Output Matrices

We organize the dataset into two main matrices:

Step 2: Estimate Coefficients

To find the best coefficient matrix B, we use the normal equation (extended for multiple outputs). This formula finds the best weights that minimize prediction error:

B = (X^T X)^{-1} X^T Y

Here:

Step 3: Make Predictions

Once we calculate B, we generate predictions using:

\hat{Y} = XB

Implementation Using Scikit-Learn

We will implement multivariate regression using an inbuilt dataset from Scikit-learn.

Step 1: Import Required Libraries

import numpy as np from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score

`

Step 2: Generate Multi Output Dataset

X, Y = make_regression( n_samples=500, n_features=5, n_targets=3, noise=10, random_state=42 )

`

Step 3: Split the Dataset

X_train, X_test, Y_train, Y_test = train_test_split( X, Y, test_size=0.2, random_state=42 )

`

Step 4: Train the Multivariate Model

model = LinearRegression() model.fit(X_train, Y_train)

`

**Output:

Output1

Model created

Step 5: Make Predictions

Y_pred = model.predict(X_test)

`

Step 6: Evaluate Performance

Here we get value corresponds to one target variable

Python `

r2 = r2_score(Y_test, Y_pred, multioutput='raw_values') print("R² for each output:", r2)

`

**Output:

output2

Output

You can download the code from here

Multiple Linear Regression vs Multivariate Regression

Multiple Linear Regression Multivariate Regression
Outputs One dependent variable More than one dependent variable
Inputs Multiple inputs allowed Multiple inputs allowed
Model Structure Separate model for each output Single combined model
Output Relationship Does not consider relation between outputs Learns relationships among outputs
Coefficient Form Coefficient vector Coefficient matrix

Advantages

Limitations

Multivariate regression is useful, but it has certain assumptions and practical challenges.