On the distribution of the largest eigenvalue in principal components analysis (original) (raw)

Open Access

April 2001 On the distribution of the largest eigenvalue in principal components analysis

Iain M. Johnstone

Ann. Statist. 29(2): 295-327 (April 2001). DOI: 10.1214/aos/1009210544

Abstract

Let x(1) denote the square of the largest singular value of an n × p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x(1) is the largest principal component variance of the covariance matrix X′XX'XXX, or the largest eigenvalue of a _p_­variate Wishart distribution on n degrees of freedom with identity covariance.

Consider the limit of large p and n with n/p=gammage1n/p = \gamma \ge 1n/p=gammage1. When centered by mup=(sqrtn−1+sqrtp)2\mu_p = (\sqrt{n-1} + \sqrt{p})^2mup=(sqrtn1+sqrtp)2 and scaled by sigmap=(sqrtn−1+sqrtp)(1/sqrtn−1+1/sqrtp1/3\sigma_p = (\sqrt{n-1} + \sqrt{p})(1/\sqrt{n-1} + 1/\sqrt{p}^{1/3}sigmap=(sqrtn1+sqrtp)(1/sqrtn1+1/sqrtp1/3, the distribution of x(1) approaches the Tracey-Widom law of order 1, which is defined in terms of the Painlevé II differential equation and can be numerically evaluated and tabulated in software. Simulations show the approximation to be informative for n and p as small as 5.

The limit is derived via a corresponding result for complex Wishart matrices using methods from random matrix theory. The result suggests that some aspects of large p multivariate distribution theory may be easier to apply in practice than their fixed p counterparts.

Citation

Download Citation

Iain M. Johnstone. "On the distribution of the largest eigenvalue in principal components analysis." Ann. Statist. 29 (2) 295 - 327, April 2001. https://doi.org/10.1214/aos/1009210544

Information

Published: April 2001

First available in Project Euclid: 24 December 2001

Digital Object Identifier: 10.1214/aos/1009210544

Subjects:

Primary: 62F20, 62H25

Secondary: 33C45, 60H25

Keywords: empirical orthogonal functions, Fredholm determinant, Karhunen–Loève transform, Laguerre ensemble, Laguerre polynomial, Largest eigenvalue, largest singular value, Liouville–Green method, Painlevé equation, Plancherel–Rotach asymptotics, Random matrix theory, Tracy–Widom distribution, Wishart distribution

Rights: Copyright © 2001 Institute of Mathematical Statistics