Factor analysis (original) (raw)

Statistical method

This article is about factor loadings. For factorial design, see Factorial experiment.

Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed variables mainly reflect the variations in two unobserved (underlying) variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modelled as linear combinations of the potential factors plus "error" terms, hence factor analysis can be thought of as a special case of errors-in-variables models.[1]

Simply put, the factor loading of a variable quantifies the extent to which the variable is related to a given factor.[2]

A common rationale behind factor analytic methods is that the information gained about the interdependencies between observed variables can be used later to reduce the set of variables in a dataset. Factor analysis is commonly used in psychometrics, personality psychology, biology, marketing, product management, operations research, finance, and machine learning. It may help to deal with data sets where there are large numbers of observed variables that are thought to reflect a smaller number of underlying/latent variables. It is one of the most commonly used inter-dependency techniques and is used when the relevant set of variables shows a systematic inter-dependence and the objective is to find out the latent factors that create a commonality.

The model attempts to explain a set of p {\displaystyle p} $p$ observations in each of n {\displaystyle n} $n$ individuals with a set of k {\displaystyle k} $k$ common factors ( f i , j {\displaystyle f_{i,j}} $f_{i,j}$ ) where there are fewer factors per unit than observations per unit ( k < p {\displaystyle k<p} $k<p$ ). Each individual has k {\displaystyle k} $k$ of their own common factors, and these are related to the observations via the factor loading matrix ( L ∈ R p × k {\displaystyle L\in \mathbb {R} ^{p\times k}} $L\in \mathbb {R} ^{p\times k}$ ), for a single observation, according to

x i , m − μ i = l i , 1 f 1 , m + ⋯ + l i , k f k , m + ε i , m {\displaystyle x_{i,m}-\mu _{i}=l_{i,1}f_{1,m}+\dots +l_{i,k}f_{k,m}+\varepsilon _{i,m}} $x_{i,m}-\mu _{i}=l_{i,1}f_{1,m}+\dots +l_{i,k}f_{k,m}+\varepsilon _{i,m}$

where

x i , m {\displaystyle x_{i,m}} $x_{i,m}$ is the value of the i {\displaystyle i} $i$ th observation of the m {\displaystyle m} $m$ th individual,
μ i {\displaystyle \mu _{i}} $\mu _{i}$ is the observation mean for the i {\displaystyle i} $i$ th observation,
l i , j {\displaystyle l_{i,j}} $l_{i,j}$ is the loading for the i {\displaystyle i} $i$ th observation of the j {\displaystyle j} $j$ th factor,
f j , m {\displaystyle f_{j,m}} $f_{j,m}$ is the value of the j {\displaystyle j} $j$ th factor of the m {\displaystyle m} $m$ th individual, and
ε i , m {\displaystyle \varepsilon _{i,m}} $\varepsilon _{i,m}$ is the ( i , m ) {\displaystyle (i,m)} $(i,m)$ th unobserved stochastic error term with mean zero and finite variance.

In matrix notation

X − M = L F + ε {\displaystyle X-\mathrm {M} =LF+\varepsilon } $X-\mathrm {M} =LF+\varepsilon$

where observation matrix X ∈ R p × n {\displaystyle X\in \mathbb {R} ^{p\times n}} $X\in \mathbb {R} ^{p\times n}$ , loading matrix L ∈ R p × k {\displaystyle L\in \mathbb {R} ^{p\times k}} $L\in \mathbb {R} ^{p\times k}$ , factor matrix F ∈ R k × n {\displaystyle F\in \mathbb {R} ^{k\times n}} $F\in \mathbb {R} ^{k\times n}$ , error term matrix ε ∈ R p × n {\displaystyle \varepsilon \in \mathbb {R} ^{p\times n}} $\varepsilon \in \mathbb {R} ^{p\times n}$ and mean matrix M ∈ R p × n {\displaystyle \mathrm {M} \in \mathbb {R} ^{p\times n}} $\mathrm {M} \in \mathbb {R} ^{p\times n}$ whereby the ( i , m ) {\displaystyle (i,m)} $(i,m)$ th element is simply M i , m = μ i {\displaystyle \mathrm {M} _{i,m}=\mu _{i}} $\mathrm {M} _{i,m}=\mu _{i}$ .

Also we will impose the following assumptions on F {\displaystyle F} $F$ :

F {\displaystyle F} $F$ and ε {\displaystyle \varepsilon } $\varepsilon$ are independent.
E ( F ) = 0 {\displaystyle \mathrm {E} (F)=0} $\mathrm {E} (F)=0$ ; where E {\displaystyle \mathrm {E} } $\mathrm {E}$ is Expectation
C o v ( F ) = I {\displaystyle \mathrm {Cov} (F)=I} $\mathrm {Cov} (F)=I$ where C o v {\displaystyle \mathrm {Cov} } $\mathrm {Cov}$ is the covariance matrix, to make sure that the factors are uncorrelated, and I {\displaystyle I} $I$ is the identity matrix.

Suppose C o v ( X − M ) = Σ {\displaystyle \mathrm {Cov} (X-\mathrm {M} )=\Sigma } $\mathrm {Cov} (X-\mathrm {M} )=\Sigma$ . Then

Σ = C o v ( X − M ) = C o v ( L F + ε ) , {\displaystyle \Sigma =\mathrm {Cov} (X-\mathrm {M} )=\mathrm {Cov} (LF+\varepsilon ),\,} $\Sigma =\mathrm {Cov} (X-\mathrm {M} )=\mathrm {Cov} (LF+\varepsilon ),\,$

and therefore, from conditions 1 and 2 imposed on F {\displaystyle F} $F$ above, E [ L F ] = L E [ F ] = 0 {\displaystyle E[LF]=LE[F]=0} $E[LF]=LE[F]=0$ and C o v ( L F + ϵ ) = C o v ( L F ) + C o v ( ϵ ) {\displaystyle Cov(LF+\epsilon )=Cov(LF)+Cov(\epsilon )} $Cov(LF+\epsilon )=Cov(LF)+Cov(\epsilon )$ , giving

Σ = L C o v ( F ) L T + C o v ( ε ) , {\displaystyle \Sigma =L\mathrm {Cov} (F)L^{T}+\mathrm {Cov} (\varepsilon ),\,} $\Sigma =L\mathrm {Cov} (F)L^{T}+\mathrm {Cov} (\varepsilon ),\,$

or, setting Ψ := C o v ( ε ) {\displaystyle \Psi :=\mathrm {Cov} (\varepsilon )} $\Psi :=\mathrm {Cov} (\varepsilon )$ ,

Σ = L L T + Ψ . {\displaystyle \Sigma =LL^{T}+\Psi .\,} $\Sigma =LL^{T}+\Psi .\,$

For any orthogonal matrix Q {\displaystyle Q} $Q$ , if we set L ′ = L Q {\displaystyle L^{\prime }=\ LQ} $L^{\prime }=\ LQ$ and F ′ = Q T F {\displaystyle F^{\prime }=Q^{T}F} $F^{\prime }=Q^{T}F$ , the criteria for being factors and factor loadings still hold. Hence a set of factors and factor loadings is unique only up to an orthogonal transformation.

Suppose a psychologist has the hypothesis that there are two kinds of intelligence, "verbal intelligence" and "mathematical intelligence", neither of which is directly observed.[note 1] Evidence for the hypothesis is sought in the examination scores from each of 10 different academic fields of 1000 students. If each student is chosen randomly from a large population, then each student's 10 scores are random variables. The psychologist's hypothesis may say that for each of the 10 academic fields, the score averaged over the group of all students who share some common pair of values for verbal and mathematical "intelligences" is some constant times their level of verbal intelligence plus another constant times their level of mathematical intelligence, i.e., it is a linear combination of those two "factors". The numbers for a particular subject, by which the two kinds of intelligence are multiplied to obtain the expected score, are posited by the hypothesis to be the same for all intelligence level pairs, and are called "factor loading" for this subject. [_clarification needed_] For example, the hypothesis may hold that the predicted average student's aptitude in the field of astronomy is

{10 × the student's verbal intelligence} + {6 × the student's mathematical intelligence}.

The numbers 10 and 6 are the factor loadings associated with astronomy. Other academic subjects may have different factor loadings.

Two students assumed to have identical degrees of verbal and mathematical intelligence may have different measured aptitudes in astronomy because individual aptitudes differ from average aptitudes (predicted above) and because of measurement error itself. Such differences make up what is collectively called the "error" — a statistical term that means the amount by which an individual, as measured, differs from what is average for or predicted by his or her levels of intelligence (see errors and residuals in statistics).

The observable data that go into factor analysis would be 10 scores of each of the 1000 students, a total of 10,000 numbers. The factor loadings and levels of the two kinds of intelligence of each student must be inferred from the data.

Factor analysis (original) (raw)

Mathematical model of the same example

Geometric interpretation

Practical implementation

Types of factor analysis

Exploratory factor analysis

Confirmatory factor analysis

Criteria for determining the number of factors

Problems with factor rotation

Higher order factor analysis

Exploratory factor analysis (EFA) versus principal components analysis (PCA)

Arguments contrasting PCA and EFA

Variance versus covariance

Differences in procedure and results

Applications in psychology

In cross-cultural research

In political science

Information collection

In physical and biological sciences

In microarray analysis