Rodeo: Sparse, greedy nonparametric regression (original) (raw)

Bandwidth Selection in Nonparametric Regression with Large Sample Size

Proceedings

In the context of nonparametric regression estimation, the behaviour of kernel methods such as the Nadaraya-Watson or local linear estimators is heavily influenced by the value of the bandwidth parameter, which determines the trade-off between bias and variance. This clearly implies that the selection of an optimal bandwidth, in the sense of minimizing some risk function (MSE, MISE, etc.), is a crucial issue. However, the task of estimating an optimal bandwidth using the whole sample can be very expensive in terms of computing time in the context of Big Data, due to the computational complexity of some of the most used algorithms for bandwidth selection (leave-one-out cross validation, for example, has O ( n 2 ) complexity). To overcome this problem, we propose two methods that estimate the optimal bandwidth for several subsamples of our large dataset and then extrapolate the result to the original sample size making use of the asymptotic expression of the MISE bandwidth. Preliminar...

Selection of variables and dimension reduction in high-dimensional non-parametric regression

Electronic Journal of Statistics, 2008

We consider a l 1 -penalization procedure in the non-parametric Gaussian regression model. In many concrete examples, the dimension d of the input variable X is very large (sometimes depending on the number of observations). Estimation of a β-regular regression function f cannot be faster than the slow rate n −2β/(2β+d) . Hopefully, in some situations, f depends only on a few numbers of the coordinates of X. In this paper, we construct two procedures. The first one selects, with high probability, these coordinates. Then, using this subset selection method, we run a local polynomial estimator (on the set of interesting coordinates) to estimate the regression function at the rate n −2β/(2β+d * ) , where d * , the "real" dimension of the problem (exact number of variables whom f depends on), has replaced the dimension d of the design. To achieve this result, we used a l 1 penalization method in this non-parametric setup.

Bandwidth selection for local linear regression smoothers

Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2002

The paper presents a general strategy for selecting the bandwidth of nonparametric regression estimators and specializes it to local linear regression smoothers. The procedure requires the sample to be divided into a training sample and a testing sample. Using the training sample we first compute a family of regression smoothers indexed by their bandwidths. Next we select the bandwidth by minimizing the empirical quadratic prediction error on the testing sample. The resulting bandwidth satisfies a finite sample oracle inequality which holds for all bounded regression functions. This permits asymptotically optimal estimation for nearly any regression function. The practical performance of the method is illustrated by a simulation study which shows good finite sample behaviour of our method compared with other bandwidth selection procedures.

GRID: A variable selection and structure discovery method for high dimensional nonparametric regression

The Annals of Statistics

We consider nonparametric regression in high dimensions where only a relatively small subset of a large number of variables are relevant and may have nonlinear effects on the response. We develop methods for variable selection, structure discovery and estimation of the true low-dimensional regression function, allowing any degree of interactions among the relevant variables that need not be specified a-priori. The proposed method, called the GRID, combines empirical likelihood based marginal testing with the local linear estimation machinery in a novel way to select the relevant variables. Further, it provides a simple graphical tool for identifying the low dimensional nonlinear structure of the regression function. Theoretical results establish consistency of variable selection and structure discovery, and also Oracle risk property of the GRID estimator of the regression function, allowing the dimension d of the covariates to grow with the sample size n at the rate d = O(n a) for any a ∈ (0, ∞) and the number of relevant covariates r to grow at a rate r = O(n γ) for some γ ∈ (0, 1) under some regularity conditions that, in particular, require finiteness of certain absolute moments of the error variables depending on a. Finite sample properties of the GRID are investigated in a moderately large simulation study.

Finite sample performance of kernel-based regression methods for non-parametric additive models under common bandwidth selection criterion

MPRA Paper, 2007

In this paper we investigate the finite sample performance of four kernel-based estimators that are currently available for additive nonparametric regression models-the classic backfitting estimator (CBE), the smooth backfitting estimator (SBE), the marginal integration estimator (MIE) and two versions of a two-stage estimator (2SE1, 2SE2), the first proposed by Kim, Linton and Hengartner (1999) and the second which we propose in this paper. The bandwidths are selected for each estimator by minimizing their respective asymptotic approximation of the mean average squared errors (AMASE). In our simulations, we are particularly concerned with the performance of these estimators under this unified data-driven bandwidth selection method, since in this case both the asymptotic and the finite sample properties of all estimators are currently unavailable. The comparison is based on the estimators' average squared error. Our Monte Carlo results seem to suggest that the CBE is the best performing kernel-based procedure.

Component selection and smoothing in multivariate nonparametric regression

Annals of Statistics, 2006

We propose a new method for model selection and model fitting in multivariate nonparametric regression models, in the framework of smoothing spline ANOVA. The "COSSO" is a method of regularization with the penalty functional being the sum of component norms, instead of the squared norm employed in the traditional smoothing spline method. The COSSO provides a unified framework for several recent proposals for model selection in linear models and smoothing spline ANOVA models. Theoretical properties, such as the existence and the rate of convergence of the COSSO estimator, are studied. In the special case of a tensor product design with periodic functions, a detailed analysis reveals that the COSSO does model selection by applying a novel soft thresholding type operation to the function components. We give an equivalent formulation of the COSSO estimator which leads naturally to an iterative algorithm. We compare the COSSO with the MARS, a popular method that builds functional ANOVA models, in simulations and real examples. The COSSO method can be extended to classification problems and we compare its performance with those of a number of machine learning algorithms on real datasets. The COSSO gives very competitive performances in these studies.

Data-Driven Bandwidth Selection for Nonparametric Nonstationary Regressions

SSRN Electronic Journal, 2011

We provide a solution to the open problem of bandwidth selection for the nonparametric estimation of potentially nonstationary regressions, a setting in which the popular method of cross-validation has not been justified theoretically. Our procedure is based on minimizing moment conditions involving nonparametric residuals and applies to β-recurrent Markov chains, stationary processes being a special case, as well as nonlinear functions of integrated processes. Local and uniform versions of the criterion are proposed. The selected bandwidths are rate-optimal up to a logarithmic factor, a typical cost of adaptation in other contexts. We further show that the bias induced by (near-)minimax optimality can be removed by virtue of a simple randomized procedure. In a Monte Carlo exercise, we find that our proposed bandwidth selection method, and its subsequent bias correction, fare favorably relative to cross-validation, even in stationary environments.

Data Sharpening Methods for Bias Reduction In Nonparametric Regression

Annals of Statistics, 2000

We consider methods for kernel regression when the explanatory and/or response variables are adjusted prior to substitution into a conventional estimator. This "data-sharpening" procedure is designed to preserve the advantages of relatively simple, low-order techniques, for example, their robustness against design sparsity problems, yet attain the sorts of bias reductions that are commonly associated only with high-order methods. We consider Nadaraya-Watson and local-linear methods in detail, although data sharpening is applicable more widely. One approach in particular is found to give excellent performance. It involves adjusting both the explanatory and the response variables prior to substitution into a local linear estimator. The change to the explanatory variables enhances resistance of the estimator to design sparsity, by increasing the density of design points in places where the original density had been low. When combined with adjustment of the response variables, it produces a reduction in bias by an order of magnitude. Moreover, these advantages are available in multivariate settings. The data-sharpening step is simple to implement, since it is explicitly defined. It does not involve functional inversion, solution of equations or use of pilot bandwidths.

Robust plug-in bandwidth estimators in nonparametric regression

Journal of Statistical Planning and Inference, 1997

In this paper, we propose a robust bandwidth selection method for local M-estimates used in nonparametric regression. We study the asymptotic behavior of the resulting estimates. We use the results of a Monte Carlo study to compare the performance of various competitors for moderate samples sizes. It appears that the robust plug-in bandwidth selector we propose compares favorably to its competitors, despite the need to select a pilot bandwidth. The Monte Carlo study shows that the robust plug-in bandwidth selector is very stable and relatively insensitive to the choice of the pilot.

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

Journal of The American Statistical Association, 2001

Variable selection is fundamenta l to high-dimensiona l statistical modeling, including nonparametri c regression. Many approaches in use are stepwise selection procedures , which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized likelihood approache s are proposed to handle these kinds of problems. The proposed methods select variables and estimate coef cients simultaneously. Hence they enable us to construct con dence intervals for estimated parameters. The proposed approaches are distinguished from others in that the penalty functions are symmetric, nonconcav e on 401ˆ5, and have singularities at the origin to produce sparse solutions. Furthermore, the penalty functions should be bounded by a constant to reduce bias and satisfy certain conditions to yield continuous solutions. A new algorithm is proposed for optimizing penalized likelihood functions. The proposed ideas are widely applicable. They are readily applied to a variety of parametric models such as generalized linear models and robust regression models. They can also be applied easily to nonparametri c modeling by using wavelets and splines. Rates of convergenc e of the proposed penalized likelihood estimators are established. Furthermore, with proper choice of regularization parameters, we show that the proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well as if the correct submodel were known. Our simulation shows that the newly proposed methods compare favorably with other variable selection techniques. Furthermore, the standard error formulas are tested to be accurate enough for practical applications.