Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators (original) (raw)
Electronic Journal of Statistics
For the problem of high-dimensional sparse linear regression, it is known that an 0-based estimator can achieve a 1/n "fast" rate for prediction error without any conditions on the design matrix, whereas in the absence of restrictive conditions on the design matrix, popular polynomialtime methods only guarantee the 1/ √ n "slow" rate. In this paper, we show that the slow rate is intrinsic to a broad class of M-estimators. In particular, for estimators based on minimizing a least-squares cost function together with a (possibly nonconvex) coordinate-wise separable regularizer, there is always a "bad" local optimum such that the associated prediction error is lower bounded by a constant multiple of 1/ √ n. For convex regularizers, this lower bound applies to all global optima. The theory is applicable to many popular estimators, including convex 1-based methods as well as Mestimators based on nonconvex regularizers, including the SCAD penalty or the MCP regularizer. In addition, we show that bad local optima are very common, in that a broad class of local minimization algorithms with random initialization typically converge to a bad solution.
Sign up for access to the world's latest research.
checkGet notified about relevant papers
checkSave papers to use in your research
checkJoin the discussion with peers
checkTrack your impact