Gram-Schmidt Orthogonalization and Regression (original) (raw)

This vignette illustrates the process of transforming a set of variables to a new set of uncorrelated (orthogonal) variables. It carries out the Gram-Schmidt process directly by successively projecting each successive variable on the previous ones and subtracting (taking residuals). This is equivalent by replacing each successive variable by its residuals from a least squares regression on the previous variables.

When this method is used on the predictors in a regression problem, the resulting orthogonal variables have exactly the sameanova() summary (based on “Type I”, sequential sums of squares) as do original variables.

Setup

We use the class data set, but convert the character factor sex to a dummy (0/1) variable male.

library(matlib)
data(class)
class$male <- as.numeric(class$sex=="M")

For later use in regression, we create a variable IQ as a response variable

class <- transform(class, 
                   IQ = round(20 + height + 3*age -.1*weight -3*male + 10*rnorm(nrow(class))))
head(class)

##         sex age height weight male  IQ
## Alfred    M  14   69.0  112.5    1 124
## Alice     F  13   56.5   84.0    0 117
## Barbara   F  13   65.3   98.0    0 105
## Carol     F  14   62.8  102.5    0 122
## Henry     M  14   63.5  102.5    1 102
## James     M  12   57.3   83.0    1  87

Reorder the predictors we want, forming a numeric matrix,X.

X <- as.matrix(class[,c(3,4,2,5)])
head(X)

##         height weight age male
## Alfred    69.0  112.5  14    1
## Alice     56.5   84.0  13    0
## Barbara   65.3   98.0  13    0
## Carol     62.8  102.5  14    0
## Henry     63.5  102.5  14    1
## James     57.3   83.0  12    1

Orthogonalization by projections

The Gram-Schmidt process treats the variables in a given order, according to the columns in X. We start with a new matrixZ consisting of X[,1]. Then, find a new variable Z[,2] orthogonal to Z[,1] by subtracting the projection of X[,2] onZ[,1].

Z <- cbind(X[,1], 0, 0, 0)
Z[,2] <- X[,2] - Proj(X[,2], Z[,1])
crossprod(Z[,1], Z[,2])     # verify orthogonality

##           [,1]
## [1,] 7.276e-12

Continue in the same way, subtracting the projections ofX[,3] on the previous columns, and so forth

Z[,3] <- X[,3] - Proj(X[,3], Z[,1]) - Proj(X[,3], Z[,2]) 
Z[,4] <- X[,4] - Proj(X[,4], Z[,1]) - Proj(X[,4], Z[,2]) - Proj(X[,4], Z[,3])

Note that if any column of X is a linear combination of the previous columns, the corresponding column of Z will be all zeros.

These computations are similar to the following set of linear regressions:

z2 <- residuals(lm(X[,2] ~ X[,1]), type="response")
z3 <- residuals(lm(X[,3] ~ X[,1:2]), type="response")
z4 <- residuals(lm(X[,4] ~ X[,1:3]), type="response")

The columns of Z are now orthogonal, but not of unit length,

zapsmall(crossprod(Z))     # check orthogonality

##       [,1] [,2] [,3] [,4]
## [1,] 57888    0    0    0
## [2,]     0 3249    0    0
## [3,]     0    0    7    0
## [4,]     0    0    0    2

We make standardize column to unit length, giving Z as an orthonormal matrix, such that $Z' Z = I$.

Z <- Z %*% diag(1 / len(Z))    # make each column unit length
zapsmall(crossprod(Z))         # check orthonormal

##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]    0    1    0    0
## [3,]    0    0    1    0
## [4,]    0    0    0    1

colnames(Z) <- colnames(X)

Relationship to QR factorization

The QR method uses essentially the same process, factoring the matrix$\mathbf{X}$ as $\mathbf{X = Q R}$, where $\mathbf{Q}$ is the orthonormal matrix corresponding to Z and $\mathbf{R}$ is an upper triangular matrix. However, the signs of the columns of $\mathbf{Q}$ are arbitrary, andQR() returns QR(X)$Q with signs reversed, compared to Z.

# same result as QR(X)$Q, but with signs reversed
head(Z, 5)

##         height   weight     age     male
## Alfred  0.2868  0.07545 -0.3687  0.12456
## Alice   0.2348 -0.08067  0.3569 -0.02177
## Barbara 0.2714 -0.07715 -0.3862 -0.45170
## Carol   0.2610  0.07058  0.1559 -0.20548
## Henry   0.2639  0.05132  0.1047  0.40538

##        [,1]     [,2]    [,3]     [,4]
## [1,] 0.2868  0.07545 -0.3687  0.12456
## [2,] 0.2348 -0.08067  0.3569 -0.02177
## [3,] 0.2714 -0.07715 -0.3862 -0.45170
## [4,] 0.2610  0.07058  0.1559 -0.20548
## [5,] 0.2639  0.05132  0.1047  0.40538

all.equal( unname(Z), -QR(X)$Q )

## [1] TRUE

Regression with X and Z

We carry out two regressions of IQ on the variables inX and in Z. These are equivalent, in the sense that

The $R^2$ and MSE are the same in both models
Residuals are the same
The Type I tests given by anova() are the same.

class2 <- data.frame(Z, IQ=class$IQ)

Regression of IQ on the original variables in X

mod1 <- lm(IQ ~ height + weight + age + male, data=class)
anova(mod1)

## Analysis of Variance Table
## 
## Response: IQ
##           Df Sum Sq Mean Sq F value Pr(>F)   
## height     1   1131    1131   13.99 0.0038 **
## weight     1    176     176    2.18 0.1707   
## age        1    117     117    1.44 0.2570   
## male       1    226     226    2.79 0.1256   
## Residuals 10    808      81                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Regression of IQ on the orthogonalized variables inZ

mod2 <- lm(IQ ~ height + weight + age + male, data=class2)
anova(mod2)

## Analysis of Variance Table
## 
## Response: IQ
##           Df Sum Sq Mean Sq F value Pr(>F)   
## height     1   1131    1131   13.99 0.0038 **
## weight     1    176     176    2.18 0.1707   
## age        1    117     117    1.44 0.2570   
## male       1    226     226    2.79 0.1256   
## Residuals 10    808      81                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This illustrates that anova() tests for linear models are sequential tests. They test hypotheses about the extra contribution of each variable over and above all previous ones, in a given order. These usually do not make substantive sense, except in testing ordered (“hierarchical”) models.