Penalized estimation of high-dimensional models under a generalized sparsity cindition (original) (raw)
Related papers
Flexible Shrinkage Estimation in High-Dimensional Varying Coefficient Models
2010
We consider the problem of simultaneous variable selection and constant coefficient identification in high-dimensional varying coefficient models based on B-spline basis expansion. Both objectives can be considered as some type of model selection problems and we show that they can be achieved by a double shrinkage strategy. We apply the adaptive group Lasso penalty in models involving a diverging number of covariates, which can be much larger than the sample size, but we assume the number of relevant variables is smaller than the sample size via model sparsity. Such so-called ultra-high dimensional settings are especially challenging in semiparametric models as we consider here and has not been dealt with before. Under suitable conditions, we show that consistency in terms of both variable selection and constant coefficient identification can be achieved, as well as the oracle property of the constant coefficients. Even in the case that the zero and constant coefficients are known a priori, our results appear to be new in that it reduces to semivarying coefficient models (a.k.a. partially linear varying coefficient models) with a diverging number of covariates. We also theoretically demonstrate the consistency of a semiparametric BIC-type criterion in this high-dimensional context, extending several previous results. The finite sample behavior of the estimator is evaluated by some Monte Carlo studies.
High-dimensional inference in linear models: robustness and adaptivity to model sparsity
In high-dimensional linear models, the sparsity assumption is typically made, stating that most of the model parameters have value equal to zero. Under the sparsity assumption, estimation and, recently, inference as well as the fundamental limits of detection have been well studied. However, in certain cases, sparsity assumption may be violated, and a large number of covariates can be expected to be associated with the response, indicating that possibly all, rather just a few, model parameters are different from zero. A natural example is a genome-wide gene expression profiling, where all genes are believed to affect a common disease marker. We show that the current inferential methods are sensitive to the sparsity assumption, and may in turn result in severe bias: lack of control of Type-I error is apparent once the model is not sparse. In this article, we propose a new inferential method, named CorrT, which is robust and adaptive to the sparsity assumption. CorrT is shown to have Type I error approaching the nominal level, regardless of how sparse or dense the model is. Specifically, the developed test is based on a moment condition induced by the hypothesis and the covariate structure of the model design. Such a construction circumvents the fundamental difficulty of accurately estimating non-sparse high-dimensional models. As a result, the proposed test guards against large estimation errors caused by potential absence of sparsity, and at the same time, adapts to the model sparsity. In fact, CorrT is also shown to be optimal whenever sparsity holds. Numerical experiments show favorable performance of CorrT compared to existing methods. We also apply CorrT to a real dataset and confirm some known discoveries related to HER2+ cancer patients and the gene-to-gene interaction.
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2009
We present a new class of methods for high-dimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive an algorithm for fitting the models that is practical and effective even when the number of covariates is larger than the sample size. SpAM is essentially a functional version of the grouped lasso of Yuan and Lin . SpAM is also closely related to the COSSO model of , but decouples smoothing and sparsity, enabling the use of arbitrary nonparametric smoothers. We give an analysis of the theoretical properties of sparse additive models, and present empirical results on synthetic and real data, showing that SpAM can be effective in fitting sparse nonparametric models in high dimensional data.
2017
We propose a new sparse estimation method, termed MIC (Minimum approximated Information Criterion), for generalized linear models (GLM) in fixed dimensions. What is essentially involved in MIC is the approximation of the `0-norm with a continuous unit dent function. Besides, a reparameterization step is devised to enforce sparsity in parameter estimates while maintaining the smoothness of the objective function. MIC yields superior performance in sparse estimation by optimizing the approximated information criterion without reducing the search space and is computationally advantageous since no selection of tuning parameters is required. Moreover, the reparameterization tactic leads to valid significance testing results that are free of post-selection inference. We explore the asymptotic properties of MIC and illustrate its usage with both simulated experiments and empirical examples.
Direct estimation of low-dimensional components in additive models
The Annals of Statistics, 1998
Additive regression models have turned out to be a useful statistical tool in analyses of high-dimensional data sets. Recently, an estimator of additive components has been introduced by Linton and Nielsen which is based on marginal integration. The explicit definition of this estimator makes possible a fast computation and allows an asymptotic distribution theory. In this paper an asymptotic treatment of this estimate is offered for several models. A modification of this procedure is introduced. We consider weighted marginal integration for local linear fits and we show that this estimate has the following advantages.
Analysis of Penalized Regression Methods in a Simple Linear Model on the High-Dimensional Data
American Journal of Theoretical and Applied Statistics, 2019
Shrinkage methods for linear regression were developed over the last ten years to reduce the weakness of ordinary least squares (OLS) regression with respect to prediction accuracy. And, high dimensional data are quickly growing in many areas due to the development of technological advances which helps collect data with a large number of variables. In this paper, shrinkage methods were used to evaluate regression coefficients effectively for the high-dimensional multiple regression model, where there were fewer samples than predictors. Also, regularization approaches have become the methods of choice for analyzing such high dimensional data. We used three regulation methods based on penalized regression to select the appropriate model. Lasso, Ridge and Elastic Net have desirable features; they can simultaneously perform the regulation and selection of appropriate predictor variables and estimate their effects. Here, we compared the performance of three regular linear regression methods using cross-validation method to reach the optimal point. Prediction accuracy using the least squares error (MSE) was evaluated. Through conducting a simulation study and studying real data, we found that all three methods are capable to produce appropriate models. The Elastic Net has better prediction accuracy than the rest. However, in the simulation study, the Elastic Net outperformed other two methods and showed a less value in terms of MSE.
Electronic Journal of Statistics, 2012
In this paper, we formulate the partially linear single-index models as bi-index dimension reduction models for the purpose of identifying significant covariates in both the linear part and the single-index part through only one combined index in a dimension reduction approach. This is different from all existing dimension reduction methods in the literature, which in general identify two basis directions in the subspace spanned by the parameter vectors of interest, rather than the two parameter vectors themselves. This approach makes the identification and the subsequent estimation and variable selection easier than existing methods for multi-index models. When the number of parameters diverges with the sample size, we then adopt coordinate-independent sparse estimation procedure to select significant covariates and estimate the corresponding parameters. The resulting sparse dimension reduction estimators are shown to be consistent and asymptotically normal with the oracle property. Simulations are conducted to evaluate the performance of the proposed method, and a real data set is analysed for an illustration.
Is there sparsity beyond additive models?
2012
In this work we are interested in the problems of supervised learning and variable selection when the input-output dependence is described by a nonlinear function depending on a few variables. Our goal is to devise a sparse nonparametric model, avoiding linear or additive models. The key intuition is to measure the importance of each variable in the model by making use of partial derivatives. Based on this idea we propose and study a new regularizer and a corresponding least squares regularization scheme. Using concepts and results from the theory of reproducing kernel Hilbert spaces and proximal methods, we show that the proposed learning algorithm induces a minimization problem which can be provably solved by an iterative procedure. The consistency properties of the obtained estimator are studied both in terms of prediction and selection performance.
Estimation of Sparse Functional Additive Models with Adaptive Group LASSO
Statistica Sinica
We study a flexible model to tackle the issue of lack of fit in the conventional functional linear regression. This model, called the sparse functional additive model, is used to characterize the relationship between a functional predictor and a scalar response of interest. The effect of the functional predictor is represented in a nonparametric additive form, where the arguments are the scaled functional principal component scores. Component selection and smoothing are considered when fitting the model to reduce the variability and enhance the prediction accuracy, while providing an adequate fit. To achieve these goals, we propose using the adaptive group LASSO method to select relevant components and smoothing splines to obtain a smoother estimate of those relevant components. Simulation studies show that the proposed estimation method compares favourably with various conventional methods in terms of prediction accuracy and component selection. The advantage of our estimation method is further demonstrated in two real data examples.