Asymptotically exact error analysis for the generalized equation-LASSO (original) (raw)
Related papers
Precise error analysis of the LASSO
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015
A classical problem that arises in numerous signal processing applications asks for the reconstruction of an unknown, ksparse signal x 0 ∈ R n from underdetermined, noisy, linear measurements y = Ax 0 + z ∈ R m. One standard approach is to solve the following convex programx = arg min x y − Ax 2 +λ x 1 , which is known as the 2-LASSO. We assume that the entries of the sensing matrix A and of the noise vector z are i.i.d Gaussian with variances 1/m and σ 2. In the large system limit when the problem dimensions grow to infinity, but in constant rates, we precisely characterize the limiting behavior of the normalized squared error x − x 0 2 2 /σ 2. Our numerical illustrations validate our theoretical predictions.
The LASSO risk: asymptotic results and real world examples
2010
Abstract We consider the problem of learning a coefficient vector x0∈ RN from noisy linear observation y= Ax0+ w∈ Rn. In many contexts (ranging from model selection to image processing) it is desirable to construct a sparse estimator ̂x. In this case, a popular approach consists in solving an ℓ1-penalized least squares problem known as the LASSO or Basis Pursuit DeNoising (BPDN).
Discussion: A tale of three cousins: Lasso, L2Boosting and Dantzig
The Annals of Statistics, 2007
We would like to congratulate the authors for their thought-provoking and interesting paper. The Dantzig paper is on the timely topic of highdimensional data modeling that has been the center of much research lately and where many exciting results have been obtained. It also falls in the very hot area at the interface of statistics and optimization: ℓ 1 -constrained minimization in linear models for computationally efficient model selection, or sparse model estimation (Chen, Donoho and Saunders and Tibshirani ). The sparsity consideration indicates a trend in high-dimensional data modeling advancing from prediction, the hallmark of machine learning, to sparsity-a proxy for interpretability. This trend has been greatly fueled by the participation of statisticians in machine learning research. In particular, Lasso Tibshirani [17]) is the focus of many sparsity studies in terms of both theoretical analysis (Knight and Fu [10], Greenshtein and Ritov [9], van de Geer [19], Bunea, Tsybakov and Wegkamp [3], Meinshausen and Bühlmann [13], Zhao and Yu [23] and Wainwright [20]) and fast algorithm development (Osborne, Presnell and Turlach [15] and Efron et al. [8]).
A Universal Analysis of Large-Scale Regularized Least Squares Solutions
2017
A problem that has been of recent interest in statistical inference, machine learning and signal processing is that of understanding the asymptotic behavior of regularized least squares solutions under random measurement matrices (or dictionaries). The Least Absolute Shrinkage and Selection Operator (LASSO or least-squares with ell_1\ell_1ell_1 regularization) is perhaps one of the most interesting examples. Precise expressions for the asymptotic performance of LASSO have been obtained for a number of different cases, in particular when the elements of the dictionary matrix are sampled independently from a Gaussian distribution. It has also been empirically observed that the resulting expressions remain valid when the entries of the dictionary matrix are independently sampled from certain non-Gaussian distributions. In this paper, we confirm these observations theoretically when the distribution is sub-Gaussian. We further generalize the previous expressions for a broader family of regulari...
Sparsity and smoothness via the fused lasso
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2005
The lasso penalizes a least squares regression by the sum of the absolute values (L 1 -norm) of the coefficients. The form of this penalty encourages sparse solutions (with many coefficients equal to 0). We propose the 'fused lasso', a generalization that is designed for problems with features that can be ordered in some meaningful way. The fused lasso penalizes the L 1 -norm of both the coefficients and their successive differences. Thus it encourages sparsity of the coefficients and also sparsity of their differences-i.e. local constancy of the coefficient profile. The fused lasso is especially useful when the number of features p is much greater than N, the sample size. The technique is also extended to the 'hinge' loss function that underlies the support vector classifier.We illustrate the methods on examples from protein mass spectroscopy and gene expression data.
Precise Error Analysis of Regularized $M$ -Estimators in High Dimensions
IEEE Transactions on Information Theory
A popular approach for estimating an unknown signal x 0 ∈ R n from noisy, linear measurements y = Ax 0 + z ∈ R m is via solving a so called regularized M-estimator:x := arg min x L(y − Ax) + λf (x). Here, L is a convex loss function, f is a convex (typically, non-smooth) regularizer, and, λ > 0 is a regularizer parameter. We analyze the squared error performance x − x 0 2 2 of such estimators in the high-dimensional proportional regime where m, n → ∞ and m/n → δ. The design matrix A is assumed to have entries iid Gaussian; only minimal and rather mild regularity conditions are imposed on the loss function, the regularizer, and on the noise and signal distributions. We show that the squared error converges in probability to a nontrivial limit that is given as the solution to a minimax convex-concave optimization problem on four scalar optimization variables. We identify a new summary parameter, termed the Expected Moreau envelope to play a central role in the error characterization. The precise nature of the results permits an accurate performance comparison between different instances of regularized M-estimators and allows to optimally tune the involved parameters (e.g. regularizer parameter, number of measurements). The key ingredient of our proof is the Convex Gaussian Min-max Theorem (CGMT) which is a tight and strengthened version of a classical Gaussian comparison inequality that was proved by Gordon in 1988.
IEEE Transactions on Neural Networks, 2004
In the last few years, the support vector machine (SVM) method has motivated new interest in kernel regression techniques. Although the SVM has been shown to exhibit excellent generalization properties in many experiments, it suffers from several drawbacks, both of a theoretical and a technical nature: the absence of probabilistic outputs, the restriction to Mercer kernels, and the steep growth of the number of support vectors with increasing size of the training set. In this paper, we present a different class of kernel regressors that effectively overcome the above problems. We call this approach generalized LASSO regression. It has a clear probabilistic interpretation, can handle learning sets that are corrupted by outliers, produces extremely sparse solutions, and is capable of dealing with large-scale problems. For regression functionals which can be modeled as iteratively reweighted least-squares (IRLS) problems, we present a highly efficient algorithm with guaranteed global convergence. This defies a unique framework for sparse regression models in the very rich class of IRLS models, including various types of robust regression models and logistic regression. Performance studies for many standard benchmark datasets effectively demonstrate the advantages of this model over related approaches.
The LASSO Estimator: Distributional Properties
2016
The least absolute shrinkage and selection operator (LASSO) is a popular technique for simultaneous estimation and model selection. There have been a lot of studies on the large sample asymptotic distributional properties of the LASSO estimator, but it is also well-known that the asymptotic results can give a wrong picture of the LASSO estimator's actual finite-sample behavior. The finite sample distribution of the LASSO estimator has been previously studied for the special case of orthogonal models. The aim in this work is to generalize the finite sample distribution properties of LASSO estimator for a real and linear measurement model in Gaussian noise. In this work, we derive an expression for the finite sample characteristic function of the LASSO estimator, we then use the Fourier slice theorem to obtain an approximate expression for the marginal probability density functions of the one-dimensional components of a linear transformation of the LASSO estimator.
Uncertainty Quantification in Lasso-Type Regularization Problems
2020
The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, for personal research or study, educational, or not-for-prot purposes provided that: • a full bibliographic reference is made to the original source • a link is made to the metadata record in DRO • the full-text is not changed in any way The full-text must not be sold in any format or medium without the formal permission of the copyright holders.
Properties and Iterative Methods for theQ-Lasso
Abstract and Applied Analysis, 2013
We introduce theQ-lasso which generalizes the well-known lasso of Tibshirani (1996) withQa closed convex subset of a Euclideanm-space for some integerm≥1. This setQcan be interpreted as the set of errors within given tolerance level when linear measurements are taken to recover a signal/image via the lasso. Solutions of theQ-lasso depend on a tuning parameterγ. In this paper, we obtain basic properties of the solutions as a function ofγ. Because of ill posedness, we also applyl1-l2regularization to theQ-lasso. In addition, we discuss iterative methods for solving theQ-lasso which include the proximal-gradient algorithm and the projection-gradient algorithm.