prcomp function - RDocumentation (original) (raw)
prcomp: Principal Components Analysis
Description
Performs a principal components analysis on the given data matrix and returns the results as an object of class prcomp
.
Usage
prcomp(x, …)
S3 method for formula
prcomp(formula, data = NULL, subset, na.action, …)
S3 method for default
prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE, tol = NULL, rank. = NULL, …)
S3 method for prcomp
predict(object, newdata, …)
Arguments
formula
a formula with no response variable, referring only to numeric variables.
data
an optional data frame (or similar: see[model.frame](/link/model.frame?package=stats&version=3.5.3)
) containing the variables in the formula formula
. By default the variables are taken fromenvironment(formula)
.
subset
an optional vector used to select rows (observations) of the data matrix x
.
na.action
a function which indicates what should happen when the data contain NA
s. The default is set by the na.action
setting of [options](/link/options?package=stats&version=3.5.3)
, and is[na.fail](/link/na.fail?package=stats&version=3.5.3)
if that is unset. The ‘factory-fresh’ default is [na.omit](/link/na.omit?package=stats&version=3.5.3)
.
…
arguments passed to or from other methods. If x
is a formula one might specify scale.
or tol
.
x
a numeric or complex matrix (or data frame) which provides the data for the principal components analysis.
retx
a logical value indicating whether the rotated variables should be returned.
center
a logical value indicating whether the variables should be shifted to be zero centered. Alternately, a vector of length equal the number of columns of x
can be supplied. The value is passed to scale
.
scale.
a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is FALSE
for consistency with S, but in general scaling is advisable. Alternatively, a vector of length equal the number of columns of x
can be supplied. The value is passed to [scale](/link/scale?package=stats&version=3.5.3)
.
tol
a value indicating the magnitude below which components should be omitted. (Components are omitted if their standard deviations are less than or equal to tol
times the standard deviation of the first component.) With the default null setting, no components are omitted (unless rank.
is specified less than min(dim(x))
.). Other settings for tol could betol = 0
or tol = sqrt(.Machine$double.eps)
, which would omit essentially constant components.
rank.
optionally, a number specifying the maximal rank, i.e., maximal number of principal components to be used. Can be set as alternative or in addition to tol
, useful notably when the desired rank is considerably smaller than the dimensions of the matrix.
object
object of class inheriting from "prcomp"
newdata
An optional data frame or matrix in which to look for variables with which to predict. If omitted, the scores are used. If the original fit used a formula or a data frame or a matrix with column names, newdata
must contain columns with the same names. Otherwise it must contain the same number of columns, to be used in the same order.
Value
prcomp
returns a list with class "prcomp"
containing the following components:
sdev
the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix).
rotation
the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors). The functionprincomp
returns this in the element loadings
.
x
if retx
is true the value of the rotated data (the centred (and scaled if requested) data multiplied by therotation
matrix) is returned. Hence, cov(x)
is the diagonal matrix diag(sdev^2)
. For the formula method,[napredict](/link/napredict?package=stats&version=3.5.3)()
is applied to handle the treatment of values omitted by the na.action
.
center, scale
the centering and scaling used, or FALSE
.
Details
The calculation is done by a singular value decomposition of the (centered and possibly scaled) data matrix, not by usingeigen
on the covariance matrix. This is generally the preferred method for numerical accuracy. Theprint
method for these objects prints the results in a nice format and the plot
method produces a scree plot.
Unlike [princomp](/link/princomp?package=stats&version=3.5.3)
, variances are computed with the usual divisor \(N - 1\).
Note that scale = TRUE
cannot be used if there are zero or constant (for center = TRUE
) variables.
References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)The New S Language. Wadsworth & Brooks/Cole.
Mardia, K. V., J. T. Kent, and J. M. Bibby (1979)Multivariate Analysis, London: Academic Press.
Venables, W. N. and B. D. Ripley (2002)Modern Applied Statistics with S, Springer-Verlag.
See Also
[biplot.prcomp](/link/biplot.prcomp?package=stats&version=3.5.3)
, [screeplot](/link/screeplot?package=stats&version=3.5.3)
,[princomp](/link/princomp?package=stats&version=3.5.3)
, [cor](/link/cor?package=stats&version=3.5.3)
, [cov](/link/cov?package=stats&version=3.5.3)
,[svd](/link/svd?package=stats&version=3.5.3)
, [eigen](/link/eigen?package=stats&version=3.5.3)
.
Examples
# NOT RUN {
C <- chol(S <- toeplitz(.9 ^ (0:31))) # Cov.matrix and its root
all.equal(S, crossprod(C))
set.seed(17)
X <- matrix(rnorm(32000), 1000, 32)
Z <- X %*% C ## ==> cov(Z) ~= C'C = S
all.equal(cov(Z), S, tol = 0.08)
pZ <- prcomp(Z, tol = 0.1)
summary(pZ) # only ~14 PCs (out of 32)
## or choose only 3 PCs more directly:
pz3 <- prcomp(Z, rank. = 3)
summary(pz3) # same numbers as the first 3 above
stopifnot(ncol(pZ$rotation) == 14, ncol(pz3$rotation) == 3,
all.equal(pz3$sdev, pZ$sdev, tol = 1e-15)) # exactly equal typically
# }
# NOT RUN {
## signs are random
require(graphics)
## the variances of the variables in the
## USArrests data vary by orders of magnitude, so scaling is appropriate
prcomp(USArrests) # inappropriate
prcomp(USArrests, scale = TRUE)
prcomp(~ Murder + Assault + Rape, data = USArrests, scale = TRUE)
plot(prcomp(USArrests))
summary(prcomp(USArrests, scale = TRUE))
biplot(prcomp(USArrests, scale = TRUE))
# }
Run the code above in your browser using DataLab