Gram Schmidt Process for ML (original) (raw)

Last Updated : 23 Jul, 2025

Gram-Schmidt Process is used to convert a set of vectors into an orthonormal basis. It converts a set of linearly independent vectors into a set of orthogonal vectors, which are also normalized to one unit of length.

This process is important in most fields of machine learning because it assists in enhancing numerical stability, reducing complexity in calculations and making the computation more efficient.

Orthogonality and Normalization

There are two basic concepts that must be understood before moving on to the Gram-Schmidt process: orthogonality and normalization.

**Orthogonality: Two vectors are orthogonal if the dot product equals zero. This implies that they are 90 degrees to one another. In machine learning, it is convenient to work with orthogonal vectors since they make matrix operations and computation more stable.
**Normalization: A vector becomes normalized if its magnitude (length) equals one. It is achieved by dividing all elements of the vector by its magnitude. Normalization ensures data does not get affected by differences in scale and stabilizes learning algorithms.

The two come together in the Gram-Schmidt process to give a group of orthonormal vectors—vectors that are both orthogonal and normalized.

The Gram-Schmidt Process Step by Step

The Gram-Schmidt process accepts a set of linearly independent vectors and converts them into an orthonormal set.

Assume we have a set of vectors:

\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3, ..., \mathbf{v}_n

We need to convert them into a new orthonormal set of vectors:

\mathbf{u}_1, \mathbf{u}_2, \mathbf{u}_3, ..., \mathbf{u}_n

The process goes in the following manner:

**Step 1: Select the First Vector

The first orthogonal vector is just the first vector of the original set:

\mathbf{u}_1 = \mathbf{v}_1

To make it normalized, divide it by its length:

\mathbf{e}_1 = \frac{\mathbf{u}_1}{\|\mathbf{u}_1\|}

This provides us the first orthonormal vector.

**Step 2: Orthogonalize the Second Vector

To find the second orthogonal vector, remove the component of v2 that is in the direction of \mathbf{e}_1:

\mathbf{u}_2 = \mathbf{v}_2 - \text{proj}_{\mathbf{e}_1} (\mathbf{v}_2)

Here, the projection is given by:

\text{proj}_{\mathbf{e}_1} (\mathbf{v}_2) = \frac{\mathbf{v}_2 \cdot \mathbf{e}_1}{\mathbf{e}_1 \cdot \mathbf{e}_1} \mathbf{e}_1

After obtaining u_2, nomalize it:

\mathbf{e}_2 = \frac{\mathbf{u}_2}{\|\mathbf{u}_2\|}

**Step 3: Make the Third Vector Orthogonal

For the third vector, remove the components in the directions of both \mathbf{e}_1 and \mathbf{e}_2:

\mathbf{u}_3 = \mathbf{v}_3 - \text{proj}_{\mathbf{e}_1} (\mathbf{v}_3) - \text{proj}_{\mathbf{e}_2} (\mathbf{v}_3)

After obtaining \mathbf{u}_3, normalize it:

\mathbf{e}_3 = \frac{\mathbf{u}_3}{\|\mathbf{u}_3\|}

**Step 4: Repeat for All Vectors

This process is repeated for all vectors in the original set. The general formula for any vector \mathbf{u}_k is:

\mathbf{u}_k = \mathbf{v}_k - \sum_{i=1}^{k-1} \text{proj}_{\mathbf{e}_i} (\mathbf{v}_k)

After obtaining \mathbf{u}_k, normalize it:

\mathbf{e}_k = \frac{\mathbf{u}_k}{\|\mathbf{u}_k\|}

This ensures that all vectors are orthonormal.

Importance of Gram-Schmidt Process in Machine Learning

Machine learning algorithms tend to handle large datasets as matrices. Among the major reasons the Gram-Schmidt process is necessary are:

**Numerical Stability: When handling big datasets, errors can accumulate due to floating-point computations. The Gram-Schmidt process minimizes the errors by converting the vectors to orthonormal.
**Dimensionality Reduction: Principal Component Analysis (PCA), one of the most widely used feature reduction methods in a dataset, is based on orthogonal transformations like the Gram-Schmidt process.
**Effective Matrix Decomposition: Most machine learning algorithms consist of decomposing matrices into simpler forms. QR decomposition, a significant step towards solving linear regression and least squares problems, utilizes the Gram-Schmidt process.
**Feature Selection:In certain situations, the Gram-Schmidt process can be used to choose the most relevant features by determining redundant information.

Applications of the Gram-Schmidt Process in Machine Learning

**QR Decomposition in Regression Models: Linear regression problems frequently require solving sets of equations. QR decomposition, which employs the Gram-Schmidt method, decomposes a matrix into an orthogonal matrix Q and an upper triangular matrix R. It facilitates efficiently solving problems involving least squares.
**Principal Component Analysis (PCA): PCA is a technique for reducing the number of dimensions in a dataset while preserving important information. The Gram-Schmidt process helps in finding orthogonal principal components, which are used to transform data into a lower-dimensional space.
**Eigenvector Computations: Eigenvectors are utilized for feature extraction and clustering by most machine learning models. The Gram-Schmidt process is useful for making eigenvectors orthonormal, improving model accuracy.
**Data Preprocessing: In datasets with correlated or redundant features, the Gram-Schmidt process can be utilized to convert data into an orthonormal basis, stabilizing machine learning models and saving computation time.

Limitations of the Gram-Schmidt Process

Despite its advantages, the Gram-Schmidt process has some limitations:

**Numerical Instability: When used with large datasets, small numerical small numerical errors can accumulate. To enhance stability, a modified form known as the Modified Gram-Schmidt Process is sometimes used to improve stability.
**Computational Cost: For large-dimensional data, the procedure can be computationally costly. More efficient methods, like Singular Value Decomposition (SVD), may be preferred in some cases.