(original) (raw)
--- title: "The NIPALS algorithm" author: "Kevin Wright" date: "2017-10-27" output: rmarkdown::html_vignette bibliography: nipals.bib vignette: > %\VignetteIndexEntry{NIPALS algorithm} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r echo=FALSE,results="hide"} options(width=90) ``` The principal components of bfX′X\bf X'XbfX′X, where bfX\bf XbfX is a column-centered matrix, can be found by several methods, including SVD and NIPALS. # Singular value decomposition The SVD (Singular Value Decomposition) of a matrix XXX is bfX=USV′,\bf X = U S V',bfX=USV′, where bfU\bf UbfU (ntimesr)(n \times r)(ntimesr) and bfV\bf VbfV (ktimesr)(k \times r)(ktimesr) are orthogonal matrices and bfS\bf SbfS is a diagonal matrix of rrr singular values. SVD does not allow missing values in the data. # NIPALS The NIPALS (Nonlinear Iterative Partial Least Squares) algorithm can be used to find the first few (or all) principal components with the decomposition bfX=bfTbfP′\bf X = \bf T \bf P 'bfX=bfTbfP′ where the columns of bfT\bf TbfT are called *scores* and the columns of bfP\bf PbfP (the rows of bfP′\bf P'bfP′) are called the *loadings*. The algorithm begins by initializing h=1h=1h=1 and bfXh=bfX\bf X_h = \bf XbfXh=bfX, then proceeds through the following basic steps: 1. Choose bfth\bf t_hbfth as any column of bfXh\bf X_hbfXh. 2. Compute loadings bfph=Xh′th/th′th\bf p_h = X_h' t_h / t_h' t_hbfph=Xh′th/th′th. 3. Let bfph=ph/sqrtph′ph\bf p_h = p_h / \sqrt{p_h' p_h}bfph=ph/sqrtph′ph. 4. Compute scores bfth=Xhph/ph′ph\bf t_h = X_h p_h / p_h' p_hbfth=Xhph/ph′ph. Repeat (3) and (4) until convergence for the hthh^{th}hth principal component. Let bfXh+1=bfXh−thph′\bf X_{h+1} = \bf X_h - t_h p_h'bfXh+1=bfXh−thph′. Let lambdah=bfth′t\lambda_h = \bf t_h' tlambdah=bfth′t (eigen value). Increment h=h+1h = h + 1h=h+1 and repeat for the next principal component. Assemble the columns of bfT\bf TbfT from the bfth\bf t_hbfth and the columns of bfP\bf PbfP from the vectors bfph\bf p_hbfph. The resulting PCs may be scaled in different ways. One way to scale the PCA solution is to define the loadings bfP=V\bf P = VbfP=V and bfT=U′S\bf T = U'SbfT=U′S. ## Missing data The NIPALS algorithm can be modified to accommodate missing values using the method of @martens2001multivariate (p. 381). If, for a certain variable kkk [column of bfX\bf XbfX], a missing value is encountered in bfX\bf XbfX for a certain object iii [row of bfX\bf XbfX], then the corresponding elements in bftih\bf t_{ih}bftih must also be skipped in the calculation of the loadings, which for bfX\bf XbfX-variable kkk is bfphk=Xk,h−1th′/(th′th).\bf p_{hk} = X_{k,h-1} t_h' / (t_h' t_h) .bfphk=Xk,h−1th′/(th′th). Likewise, if, for a certain sample iii [row of bfX\bf XbfX], a missing value is encountered in bfX\bf XbfX for a certain variable kkk [column of bfX\bf XbfX], then the corresponding elements in bfpkh\bf p_{kh}bfpkh must also be skipped in calculating the scores, which for sample iii is bftih=Xi,h−1ph/(ph′ph)\bf t_{ih} = X_{i,h-1} p_h / (p_h' p_h)bftih=Xi,h−1ph/(ph′ph) This method may have convergence problems if there are many missing values. ## Gram-Schmidt orthogonalization Because of the accumulation of floating-point errors, the orthogonality of the principal components is quickly lost as the number of components increases. @andrecut2009parallel provided a Gram-Schmidt modified version of NIPALS that stabilizes the orthogonality by re-orthogonalizing the scores and loadings at each iteration. The 'corrected' terms are: bfpc=p−P1:hP1:h′p\bf p_c = p - P_{1:h} P_{1:h}' pbfpc=p−P1:hP1:h′p and bftc=t−T1:hT1:h′t\bf t_c = t - T_{1:h} T_{1:h}' tbftc=t−T1:hT1:h′t where bfP1:h\bf P_{1:h}bfP1:h and bfT1:h\bf T_{1:h}bfT1:h are the loadings and scores matrices based on the first hhh principal components. Since bfP1:hP1:h′\bf P_{1:h} P_{1:h}'bfP1:hP1:h′ only needs to be calculated once for each PC (and incrementally), the orthogonalization is not very computationally expensive. This correction method is also used by SAS PROC HPPRINCOMP (which does not allow missing values). ## Example 1 A small dataset with two missing values. ```{r} require(nipals) B <- matrix(c(50, 67, 90, 98, 120, 55, 71, 93, 102, 129, 65, 76, 95, 105, 134, 50, 80, 102, 130, 138, 60, 82, 97, 135, 151, 65, 89, 106, 137, 153, 75, 95, 117, 133, 155), ncol=5, byrow=TRUE) B2 <- B B2[1,1] <- B2[2,1] <- NA m0 <- svd(scale(B)) # center and scale ``` ```{r} require("nipals") m1 <- nipals::nipals(B2, gramschmidt=FALSE) m2 <- nipals::nipals(B2, gramschmidt=TRUE) ``` Model `m1` omits the Gram-Schmidt orthogonalization step at each iteration. Model `m2` includes it. The eigenvalues for the two models are very similar. ```{r} round( m1$eig, 3) round( m2$eig, 3) ``` In theory, the loadings matrix bfP\bf PbfP is orthogonal so that bfP′P=I\bf P' P = IbfP′P=I. If there are missing values, however, then the calculation of approximate PCs causes numerical errors to accumulate, so that in practice only the first few components can be accurately calculated. (The coordinates of the last PC can often be quite poor.) In this small example, the first 3 PCs of model `m1` are fairly orthogonal, but the 4th and 5th PC are not so good. For model `m2`, the PCs are nearly exactly orthogonal. ```{r} # loadings round( crossprod(m1$loadings), 3) # P'P = t(P) %*% P round( crossprod(m2$loadings), 3) ``` Also in theory, bfT′T=I\bf T' T = IbfT′T=I (if eigenvalues are removed from T), but missing values again invalidate this identity, unless the Gram-Schmidt method is used. ```{r} # scores round( crossprod(m1$scores), 3) # T'T = t(T) %*% T round( crossprod(m2$scores), 3) ``` # Bibliography