Solving Linear Equations Data Science (original) (raw)

Last Updated : 20 Mar, 2026

Linear Algebra is important in Data Science as it helps represent and process data efficiently, especially for high-dimensional datasets. It also helps in understanding relationships between variables. This is useful in the following ways:

Detecting Linear Relationships Between Attributes

Linear relationships among attributes are identified using the concepts of null space and nullity. These concepts help determine whether variables are linearly dependent and whether some attributes can be expressed as combinations of others.

A generalized system of linear equations is represented as:

A x = b

**Where:

ds_2

m vs n Cases

Rank Conditions in Linear Systems

In general there are three cases that need to be understood when analyzing linear systems. These cases depend on the rank of the matrix and describe how rows and columns relate to one another. Each case is considered independently.

Case 1: m = n

The solution for this type of linear equation if A is a full rank matrix having determinant of A is equal to 0 will be:

Ax=b

x=A^{-1}b

ds_1

Matrix Solution Cases

**1. Unique Solution

Consider the given matrix equation

\begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 7 \\ 10 \end{bmatrix}

x = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix}^{-1} \begin{bmatrix} 7 \\ 10 \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \end{bmatrix}

Therefore, the solution for the given example is (x1 , x2) = (1, 2)

**2. Infinite Solutions

Consider the given matrix equation

\begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 5 \\ 10 \end{bmatrix}

Checking consistency

\begin{bmatrix} x_{1} & 2x_{2} \\ 2x_{1} & 4x_{2} \end{bmatrix} = \begin{bmatrix} 5 \\ 10 \end{bmatrix}

Row 2 is twice Row 1 so the system has only one linearly independent equation. Since there are two variables but only one independent equation, the system is consistent and has infinitely many solutions.

x_{1}+2x_{2}=5

**3. No Solution

Consider the given matrix equation:

\begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 5 \\ 9 \end{bmatrix}

Checking consistency

\begin{bmatrix} x_1 & 2x_2 \\ 2x_1 & 42x_2 \end{bmatrix} = \begin{bmatrix} 5 \\ 9 \end{bmatrix}

Compare Row 2 with 2 × Row 1:

2(x_{1}+2x_{2})=2x_{1}+4x_{2}=10\neq9

We cannot find the solution to (x1, x2)

**Case 2: m > n

**An optimization perspective

Instead of finding an exact solution to the system A x = b, we can find an x that minimizes the difference Ax-b.

Let the error vector be:

e=Ax-b

We can minimize all the errors collectively by minimizing \sum_{i=1}^{m} e_i^{2}

So, the optimization problem becomes

\begin{aligned} \sum_{i=1}^{m} e_i^{2}&=min[(Ax-b)^{T}(Ax-b)] \\&=min[(x^{T}A^{T}-b^{T})(Ax-b)]\\&=f(x) \end{aligned}

Here, we can notice that the optimization problem is a function of _x. When we solve this optimization problem, it will give us the solution for _x. We can obtain the solution to this optimization problem by differentiating f(x)with respect to _x and setting the differential to zero.

\nabla f(x)=0

Now, differentiating f(x) and setting the differential to zero results in

\begin{aligned} \nabla f(x) &= 0 \\ 2(A^{T}A)x - 2A^{T}b &= 0 \end{aligned}

Assuming that all the columns are linearly independent

x = (A^{T}A)^{-1}A^{T}b

**Note: While this solution x might not satisfy all the equations but it will ensure that the errors in the equations are collectively minimized.

**Example

Consider the given matrix equation:

\begin{bmatrix} 1&0\\ 2&0\\ 3&1\\ \end{bmatrix} % \begin{bmatrix} x_1\\ x_2\\ \end{bmatrix} = \begin{bmatrix} 1\\ -0.5\\ 5\\ \end{bmatrix}

Here m=3, n=2

Using the optimization concept

\begin{aligned} x &= (A^{T}A)^{-1}A^{T}b \\\begin{bmatrix} x_1\\ x_2\\ \end{bmatrix} &= \begin{bmatrix} 0.2&-0.6\\ -0.6&2.8\\ \end{bmatrix} \begin{bmatrix} 15\\ 5\\ \end{bmatrix} \\ &= \begin{bmatrix} 0\\ 5\\ \end{bmatrix} \end{aligned}

Therefore, the solution for the given linear equation is (x_1, x_2) = (0, 5)

Substituting in the equation shows

\begin{bmatrix} 1&0\\ 2&0\\ 3&1\\ \end{bmatrix} % \begin{bmatrix} 0\\ 5\\ \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 5\\ \end{bmatrix} \neq \begin{bmatrix} 1\\ -0.5\\ 5\\ \end{bmatrix}

So the important point to notice in case 2 is that if we have more equations than variables then we can always use the least square solution which is x = (A^{T}A)^{-1}A^{T}b .

There is one thing to keep in mind is that (A^{T}A)^{-1} exists if the columns of A are linearly independent.

**Case 3: m < n

Given below is the optimization problem min\left[ \frac{1}{2}x^{T}x \right]such that, Ax=b

We can define a Lagrangian function

min[ f(x, \lambda)] =min\left[ \frac{1}{2}x^{T}x + \lambda^{T}(Ax-b) \right]

Differentiate the Lagrangian with respect to x and set it to zero, then we will get,

\begin{aligned} x + A^{T}\lambda &= 0 \\ x &= -A^{T}\lambda \end{aligned}

Pre - multiplying by A

\begin{aligned} Ax&=b \\A(-A^{T}\lambda) &= b \\ \end{aligned}

assuming that all the rows are linearly independent

\begin{aligned} x &= -A^{T}\lambda \\ &= A^{T}(AA^{T})^{-1}b \end{aligned}

**Example

Consider the given matrix equation:

\begin{bmatrix} 1&2&3\\ 0&0&1\\ \end{bmatrix} % \begin{bmatrix} x_1\\ x_2\\ x_3\\ \end{bmatrix} = \begin{bmatrix} 2\\ 1\\ \end{bmatrix}

Here m=2 and n=3

Using the optimization concept

\begin{aligned} x &= A^{T}(AA^{T})^{-1}b \\ &= \begin{bmatrix} 1&0\\ 2&0\\ 3&1\\ \end{bmatrix} \left( \begin{bmatrix} 1&2&3\\ 0&0&1\\ \end{bmatrix} \begin{bmatrix} 1&0\\ 2&0\\ 3&1\\ \end{bmatrix} \right )^{-1} \begin{bmatrix} 2\\ 1\\ \end{bmatrix} \\ &= \begin{bmatrix} 1&0\\ 2&0\\ 3&1\\ \end{bmatrix} \begin{bmatrix} -0.2\\ 1.6\\ \end{bmatrix} \\ \begin{bmatrix} x_1\\ x_2\\ x_3\\ \end{bmatrix} &= \begin{bmatrix} -0.2\\ -0.4\\ 1\\ \end{bmatrix} \end{aligned}

The solution for the given sample is (x_1, x_2, x_3 ) = (-0.2, -0.4, 1)

You can easily verify that

\begin{bmatrix} 1&2&3\\ 0&0&1\\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ x_3\\ \end{bmatrix} = \begin{bmatrix} 2\\ 1\\ \end{bmatrix}

**Generalization

Properties of Matrix Rank

The row rank of a matrix is always equal to its column rank, regardless of the matrix size

Full Row Rank vs Full Column Rank

Consider a matrix A of size m x n

Full Row Rank

Full Column Rank