Convert Covariance Matrix to Correlation Matrix using Python (original) (raw)

In statistics, covariance measures how variables vary together, while correlation standardizes this relationship to a value between -1 and 1, making it easier to interpret. In this article, we will be discussing relationship between Covariance and Correlation and program our own function for calculating covariance and correlation using python.

Relationship Between Covariance and Correlation

Correlation is just normalized Covariance refer to the formula below:

\text{corr}(x, y) = \frac{\text{cov}(x, y)}{\sigma_x \cdot \sigma_y}

where \sigma_x, \sigma_y are the standard deviation of x and y respectively.

Program to convert Covariance to Correlation matrix

We will use the Iris dataset for demonstration. The goal is to first compute the covariance matrix manually and then convert it to a correlation matrix.

**1. Loading and displaying the dataset

Python `

import numpy as np import pandas as pd data = pd.read_csv("iris.csv") data.head()

`

**Output

irisDataset

In this example, we exclude the target column (species) since we only want numeric features:

Python `

data = dataset.iloc[:, :-1].values

`

**2. Define a Function to Calculate Covariance Between Two Variables

The covariance between two variables x and y measures how much they vary together.

Python `

Calculates covariance between two 1D arrays x and y.

def calcCov(x, y): mean_x, mean_y = x.mean(), y.mean() # Calculate mean of each variable n = len(x) # Number of observations # Compute covariance using the formula: sum((x-mean_x)*(y-mean_y)) / n return sum((x - mean_x) * (y - mean_y)) / n

`

**Explanation:

**3. Compute the Covariance Matrix

We can now build the full covariance matrix for all numeric features.

Python `

Calculates the covariance matrix for the dataset.

def covMat(data): rows, cols = data.shape cov_matrix = np.zeros((cols, cols)) # Initialize a square matrix

# Fill the covariance matrix
for i in range(cols):
    for j in range(cols):
        cov_matrix[i][j] = calcCov(data[:, i], data[:, j])
return cov_matrix

Compute covariance matrix

covMat(data)

`

**Output

Output_1

**Explanation:

This manual calculation matches the NumPy function:

Python `

np.cov(data, rowvar=False)

`

rowvar=False ensures columns are treated as features and rows as observations.

**Output

Output_2

**4. Convert Covariance Matrix to Correlation Matrix

Correlation is simply normalized covariance, dividing by the standard deviations of each variable.

Python `

def corrMat(data): rows, cols = data.shape corr_matrix = np.zeros((cols, cols))

# Compute correlation for each pair of variables
for i in range(cols):
    for j in range(cols):
        x, y = data[:, i], data[:, j]
        # Normalize covariance by product of standard deviations
        corr_matrix[i][j] = calcCov(x, y) / (x.std() * y.std())
return corr_matrix

Compute correlation matrix

corrMat(data)

`

**Output

Output_3

**Explanation:

**5. Verify Using NumPy

Python `

np.corrcoef(data, rowvar=False)

`

**Output

Output_4

This gives the same correlation matrix in a single step.