Program to Find Correlation Coefficient (original) (raw)
Last Updated : 21 Mar, 2025
The **correlation coefficient is a statistical measure that helps determine the strength and direction of the relationship between two variables. It quantifies how changes in one variable correspond to changes in another. This coefficient, sometimes referred to as the **cross-correlation coefficient, always lies between **-1 and +1:
- **-1: Strong negative correlation (when one variable increases, the other decreases).
- **0: No correlation (no relationship between the variables).
- ****+1**: Strong positive correlation (both variables increase or decrease together).
**Formula for Correlation Coefficient
The correlation coefficient (**r) is calculated using the formula:
r=\frac{n\left(\sum x y\right)-\left(\sum x\right)\left(\sum y\right)}{\sqrt{\left[n \sum x^{2}-\left(\sum x\right)^{2}\right]\left[n \Sigma y^{2}-\left(\sum y\right)^{2}\right]}}
Where:
- **n = Number of data points
- **x, y = Data values of two variables
- **Σxy = Sum of the product of corresponding x and y values
- **Σx², Σy² = Sum of squares of x and y values
**Example Calculation
Let's calculate the correlation coefficient for the given dataset:
| X | Y |
|---|---|
| 15 | 25 |
| 18 | 25 |
| 21 | 27 |
| 24 | 31 |
| 27 | 32 |
| ΣX = 105 | ΣY = 140 |
Additional calculations:
| X × Y | X² | Y² |
|---|---|---|
| 375 | 225 | 625 |
| 450 | 324 | 625 |
| 567 | 441 | 729 |
| 744 | 576 | 961 |
| 864 | 729 | 1024 |
| Σ(X × Y) = 3000 | ΣX² = 2295 | ΣY² = 3964 |
Now, applying the formula:
- r = \frac{(5 \times 3000 - 105 \times 140)}{\sqrt{(5 \times 2295 - 105^2) \times (5 \times 3964 - 140^2)}}
- r = \frac{300}{\sqrt{450 \times 220}}
- r = 0.953463
**Example Inputs & Outputs
**Example 1
**Input:
X = {43, 21, 25, 42, 57, 59}
Y = {99, 65, 79, 75, 87, 81}**Output:
r = **0.529809
**Example 2
**Input:
X = {15, 18, 21, 24, 27}
Y = {25, 25, 27, 31, 32}**Output:
r = **0.953463
Program to Computing the Correlation Coefficient in Python
C++ `
// Program to find correlation coefficient #include<bits/stdc++.h> using namespace std;
// function that returns correlation coefficient. float correlationCoefficient(int X[], int Y[], int n) { int sum_X = 0, sum_Y = 0, sum_XY = 0; int squareSum_X = 0, squareSum_Y = 0;
for (int i = 0; i < n; i++)
{
// sum of elements of array X.
sum_X = sum_X + X[i];
// sum of elements of array Y.
sum_Y = sum_Y + Y[i];
// sum of X[i] * Y[i].
sum_XY = sum_XY + X[i] * Y[i];
// sum of square of array elements.
squareSum_X = squareSum_X + X[i] * X[i];
squareSum_Y = squareSum_Y + Y[i] * Y[i];
}
// use formula for calculating correlation coefficient.
float corr = (float)(n * sum_XY - sum_X * sum_Y)
/ sqrt((n * squareSum_X - sum_X * sum_X)
* (n * squareSum_Y - sum_Y * sum_Y));
return corr;} // Driver function int main() { int X[] = {15, 18, 21, 24, 27}; int Y[] = {25, 25, 27, 31, 32}; //Find the size of array. int n = sizeof(X)/sizeof(X[0]); //Function call to correlationCoefficient. cout<<correlationCoefficient(X, Y, n); return 0; }
Java
// JAVA Program to find correlation coefficient import java.math.*; class GFG { // function that returns correlation coefficient. static float correlationCoefficient(int X[], int Y[], int n) { int sum_X = 0, sum_Y = 0, sum_XY = 0; int squareSum_X = 0, squareSum_Y = 0;
for (int i = 0; i < n; i++)
{
// sum of elements of array X.
sum_X = sum_X + X[i];
// sum of elements of array Y.
sum_Y = sum_Y + Y[i];
// sum of X[i] * Y[i].
sum_XY = sum_XY + X[i] * Y[i];
// sum of square of array elements.
squareSum_X = squareSum_X + X[i] * X[i];
squareSum_Y = squareSum_Y + Y[i] * Y[i];
}
// use formula for calculating correlation
// coefficient.
float corr = (float)(n * sum_XY - sum_X * sum_Y)/
(float)(Math.sqrt((n * squareSum_X -
sum_X * sum_X) * (n * squareSum_Y -
sum_Y * sum_Y)));
return corr;
}
// Driver function
public static void main(String args[])
{
int X[] = {15, 18, 21, 24, 27};
int Y[] = {25, 25, 27, 31, 32};
// Find the size of array.
int n = X.length;
// Function call to correlationCoefficient.
System.out.printf("%6f",
correlationCoefficient(X, Y, n));
}}
Python
Python Program to find correlation coefficient.
import math
function that returns correlation coefficient.
def correlationCoefficient(X, Y, n) : sum_X = 0 sum_Y = 0 sum_XY = 0 squareSum_X = 0 squareSum_Y = 0
i = 0
while i < n :
# sum of elements of array X.
sum_X = sum_X + X[i]
# sum of elements of array Y.
sum_Y = sum_Y + Y[i]
# sum of X[i] * Y[i].
sum_XY = sum_XY + X[i] * Y[i]
# sum of square of array elements.
squareSum_X = squareSum_X + X[i] * X[i]
squareSum_Y = squareSum_Y + Y[i] * Y[i]
i = i + 1
# use formula for calculating correlation
# coefficient.
corr = (float)(n * sum_XY - sum_X * sum_Y)/
(float)(math.sqrt((n * squareSum_X -
sum_X * sum_X)* (n * squareSum_Y -
sum_Y * sum_Y)))
return corrDriver function
X = [15, 18, 21, 24, 27] Y = [25, 25, 27, 31, 32]
Find the size of array.
n = len(X)
Function call to correlationCoefficient.
print ('{0:.6f}'.format(correlationCoefficient(X, Y, n)))
C#
// C# Program to find correlation coefficient using System; class GFG { // function that returns correlation coefficient. static float correlationCoefficient(int []X, int []Y, int n) { int sum_X = 0, sum_Y = 0, sum_XY = 0; int squareSum_X = 0, squareSum_Y = 0;
for (int i = 0; i < n; i++)
{
// sum of elements of array X.
sum_X = sum_X + X[i];
// sum of elements of array Y.
sum_Y = sum_Y + Y[i];
// sum of X[i] * Y[i].
sum_XY = sum_XY + X[i] * Y[i];
// sum of square of array elements.
squareSum_X = squareSum_X + X[i] * X[i];
squareSum_Y = squareSum_Y + Y[i] * Y[i];
}
// use formula for calculating correlation
// coefficient.
float corr = (float)(n * sum_XY - sum_X * sum_Y)/
(float)(Math.Sqrt((n * squareSum_X -
sum_X * sum_X) * (n * squareSum_Y -
sum_Y * sum_Y)));
return corr;
}
// Driver function
public static void Main()
{
int []X = {15, 18, 21, 24, 27};
int []Y = {25, 25, 27, 31, 32};
// Find the size of array.
int n = X.Length;
// Function call to correlationCoefficient.
Console.Write(Math.Round(correlationCoefficient(X, Y, n) *
1000000.0)/1000000.0);
}}
JavaScript
PHP
`
**Complexity Analysis
- **Time Complexity: O(n) where n is the size of the given arrays, as each element is processed once.
- **Auxiliary Space: O(1), since only a few extra variables are used, regardless of input size.
This efficient approach enables quick computation of the correlation coefficient, helping you analyze relationships between datasets. Whether in statistics, finance, or other domains, understanding correlation is essential for data-driven decision-making.