A Tuning-free Robust and Efficient Approach to High-dimensional Regression (original) (raw)

Abstract

We introduce a novel approach for high-dimensional regression with theoretical guarantees. The new procedure overcomes the challenge of tuning parameter selection of Lasso and possesses several appealing properties. It uses an easily simulated tuning parameter that automatically adapts to both the unknown random error distribution and the correlation structure of the design matrix. It is robust with substantial efficiency gain for heavy-tailed random errors while maintaining high efficiency for normal random errors. Comparing with other alternative robust regression procedures, it also enjoys the property of being equivariant when the response variable undergoes a scale transformation. Computationally, it can be efficiently solved via linear programming. Theoretically, under weak conditions on the random error distribution, we establish a finite-sample error bound with a near-oracle rate for the new estimator with the simulated tuning parameter. Our results make useful contributions...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (62)

Avella-Medina, M. and Ronchetti, E. (2018). Robust and consistent variable selection in high-dimensional generalized linear models. Biometrika, 105:31-44.
Belloni, A., Chernozhukov, V., and Wang, L. (2011). Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4):791-806.
Bickel, P. J., Ritov, Y., and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and dantzig selector. The Annals of Statistics, 37(4):1705-1732.
Bien, J., Gaynanova, I., Lederer, J., and Müller, C. (2016). Non-convex global minimization and false discovery rate control for the trex. arXiv preprint arXiv:1604.06815.
Bien, J., Gaynanova, I., Lederer, J., and Müller, C. L. (2018). Prediction error bounds for linear regression with the trex. TEST, pages 1-24.
Boyd, S. and Vandenberghe, L. (2004). Convex optimization. Cambridge university press.
Bradic, J., Fan, J., and Wang, W. (2011). Penalized composite quasi-likelihood for ultra- high dimensional variable selection. Journal of the Royal Statistical Society: Series B, 73(3):325-349.
Bühlmann, P. and van de Geer, S. (2011). Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media.
Bunea, F., Tsybakov, A., Wegkamp, M., et al. (2007). Sparsity oracle inequalities for the lasso. Electronic Journal of Statistics, 1:169-194.
Candes, E. and Tao, T. (2007). The dantzig selector: statistical estimation when p is much larger than n. The Annals of Statistics, 35(6):2313-2351.
Chatterjee, S. and Jafarov, J. (2015). Prediction error of cross-validated lasso. arXiv preprint arXiv:1502.06291.
Chen, J. and Chen, Z. (2008). Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95(3):759-771.
Chen, S. S., Donoho, D. L., and Saunders, M. A. (2001). Atomic decomposition by basis pursuit. SIAM review, 43(1):129-159.
Chetverikov, D., Liao, Z., and Chernozhukov, V. (2016). On cross-validated lasso. arXiv preprint arXiv:1605.02214.
Chichignoud, M., Lederer, J., and Wainwright, M. J. (2016). A practical scheme and fast algorithm to tune the lasso with optimality guarantees. Journal of Machine Learning Research, 17(231):1-20.
Clémençon, S., Colin, I., and Bellet, A. (2016). Scaling-up empirical risk minimization: optimization of incomplete u-statistics. The Journal of Machine Learning Research, 17(1):2682-2717.
Clémençon, S., Lugosi, G., and Vayatis, N. (2008). Ranking and empirical minimization of u-statistics. The Annals of Statistics, 36(2):844-874.
Dicker, L. H. (2014). Variance estimation in high-dimensional linear models. Biometrika, 101(2):269-284.
Fan, J., Fan, Y., and Barut, E. (2014). Adaptive robust variable selection. Annals of Statistics, 42(1):324.
Fan, J., Guo, S., and Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh dimensional regression. Journal of the Royal Statistical Society: Series B, 74(1):37-65.
Fan, J., Li, Q., and Wang, Y. (2017). Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. Journal of the Royal Statistical Society: Series B, 79(1):247-265.
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle property. Journal of the American Statistical Association, 96:1348-1360.
Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20(1):101-148.
Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1-22.
Hebiri, M. and Lederer, J. (2013). How correlations influence lasso prediction. IEEE Trans- actions on Information Theory, 59(3):1846-1854.
Hettmansperger, T. P. and McKean, J. W. (1998). Robust Nonparametric Statistical Methods. London: Arnold.
Hjort, N. L. and Pollard, D. (2011). Asymptotics for minimisers of convex processes. arXiv preprint arXiv:1107.3806.
Homrighausen, D. and McDonald, D. (2013). The lasso, persistence, and cross-validation. In International Conference on Machine Learning, 1031-1039.
Homrighausen, D. and McDonald, D. J. (2017). Risk consistency of cross-validation with lasso-type procedures. Statistica Sinica, 27(3):1017-1036.
Jaeckel, L. A. (1972). Estimating regression coefficients by minimizing the dispersion of the residuals. The Annals of Mathematical Statistics, 43(5):1449-1458.
Koenker, R. (2005). Quantile Regression. Cambridge University Press, New York.
Lederer, J. and Müller, C. (2015). Don't fall for tuning parameters: tuning-free variable selection in high dimensions with the trex. AAAI Conference on Artificial Intelligence, 2729-2735.
Ledoux, M. and Talagrand, M. (2013). Probability in Banach Spaces: Isoperimetry and Processes. Springer Science & Business Media.
Lee, E. R., Noh, H., and Park, B. U. (2014). Model selection via bayesian information criterion for quantile regression models. Journal of the American Statistical Association, 109(505):216-229.
Li, X., Zhao, T., Wang, L., Yuan, X., and Liu, H. (2018). Flare: family of Lasso regression. R package version 1.6.0.
Loh, P.-L. (2017). Statistical consistency and asymptotic normality for high-dimensional robust m-estimators. The Annals of Statistics, 45(2):866-896.
Lozano, A. C., Meinshausen, N., Yang, E., et al. (2016). Minimum distance lasso for robust high-dimensional regression. Electronic Journal of Statistics, 10(1):1296-1340.
Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3):1436-1462.
Naranjo, J. D. and McKean, J. W. (1997). Rank regression with estimated scores. Statistics & probability letters, 33(2):209-216.
Parzen, M., Wei, L., and Ying, Z. (1994). A resampling method based on pivotal estimating functions. Biometrika, 81(2):341-350.
Peng, B. and Wang, L. (2015). An iterative coordinate descent algorithm for high- dimensional nonconvex penalized quantile regression. Journal of Computational and Graphical Statistics, 24(3):676-694.
Sabourin, J. A., Valdar, W., and Nobel, A. B. (2015). A permutation approach for selecting the penalty parameter in penalized model selection. Biometrics, 71(4):1185-1194.
Sun, Q., Zhou, W.-X., and Fan, J. (2020). Adaptive huber regression. Journal of the American Statistical Association, 115:254-265.
Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika, 99(4):879-898.
Sun, T. and Zhang, C.-H. (2013). Sparse matrix inversion with scaled lasso. The Journal of Machine Learning Research, 14(1):3385-3418.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B, 58(1):267-288.
Van de Geer, S. A. et al. (2008). High-dimensional generalized linear models and the lasso. The Annals of Statistics, 36(2):614-645.
van der Vaart, A. and Wellner, J. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Science & Business Media.
Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using l 1 -constrained quadratic programming (lasso). IEEE transactions on information theory, 55(5):2183-2202.
Wang, H., Li, R., and Tsai, C.-L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94(3):553-568.
Wang, L. (2013). The l 1 penalized lad estimator for high dimensional linear regression. Journal of Multivariate Analysis, 120:135-151.
Wang, L., Kai, B., and Li, R. (2009). Local rank inference for varying coefficient models. Journal of the American Statistical Association, 104(488):1631-1645.
Wang, L., Kim, Y., and Li, R. (2013). Calibrating non-convex penalized regression in ultra- high dimension. Annals of statistics, 41(5):2505-2536.
Wang, L. and Li, R. (2009). Weighted wilcoxon-type smoothly clipped absolute deviation method. Biometrics, 65(2):564-571.
Wang, L., Wu, Y., and Li, R. (2012). Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association, 107(497):214-222.
Yu, G. and Bien, J. (2019). Estimating the error variance in a high-dimensional linear model. Biometrika, 106(3):533-546.
Zhang, C. H. (2010a). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38:894-942.
Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the lasso selection in high- dimensional linear regression. The Annals of Statistics, 36(4):1567-1594.
Zhang, C.-H. and Zhang, T. (2012). A general theory of concave regularization for high- dimensional sparse estimation problems. Statistical Science, 27(4):576-593.
Zhang, T. (2010b). Analysis of multi-stage convex relaxation for sparse regularization. Jour- nal of Machine Learning Research, 11:1081-1107.
Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. Journal of Machine Learning Research, 7:2541-2563.
Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Annals of Statistics, 36:1509-1566.