COBRA: A Nonlinear Aggregation Strategy (original) (raw)
Related papers
COBRA: A combined regression strategy
Journal of Multivariate Analysis, 2015
A new method for combining several initial estimators of the regression function is introduced. Instead of building a linear or convex optimized combination over a collection of basic estimators r 1 ,. .. , r M , we use them as a collective indicator of the proximity between the training data and a test observation. This local distance approach is model-free and very fast. More specifically, the resulting nonparametric/nonlinear combined estimator is shown to perform asymptotically at least as well in the L 2 sense as the best combination of the basic estimators in the collective. A companion R package called COBRA (standing for COmBined Regression Alternative) is presented (downloadable on http://cran.r-project.org/web/packages/ COBRA/index.html). Substantial numerical evidence is provided on both synthetic and real data sets to assess the excellent performance and velocity of our method in a large variety of prediction problems.
Aggregating regression procedures to improve performance
Bernoulli, 2004
Methods have been proposed to linearly combine candidate regression procedures to improve estimation accuraccy. Applications of these methods in many examples are very succeesful, pointing to the great potential of combining procedures. A fundamental question regarding combining procedure is: What is the potential gain and how much one needs to pay for it?
A comparison of model aggregation methods for regression
2003
Combining machine learning models is a means of improving overall accuracy. Various algorithms have been proposed to create aggregate models from other models, and two popular examples for classification are Bagging and AdaBoost. In this paper we examine their adaptation to regression, and benchmark them on synthetic and real-world data. Our experiments reveal that different types of AdaBoost algorithms require different complexities of base models.
Improving the aggregating algorithm for regression
Kernel Ridge Regression (KRR) and the recently developed Kernel Aggregating Algorithm for Regression (KAAR) are regression methods based on Least Squares. KAAR has theoretical advantages over KRR since a bound on its square loss for the worst case is known that does not hold for KRR. This bound does not make any assumptions about the underlying probability distribution of the data. In practice, however, KAAR performs better only when the data is heavily corrupted by noise or has severe outliers. This is due to the fact that KAAR is similar to KRR but with some fairly strong extra regularisation. In this paper we develop KAAR in such a way as to make it practical for use on real world data. This is achieved by controlling the amount of extra regularisation. Empirical results (including results on the well known Boston Housing dataset) suggest that in general our new methods perform as well as or better than KRR, KAAR and Support Vector Machines (SVM) in terms of the square loss they suffer.
Aggregating Regression Procedures for a Better Performance
Methods have been proposed to linearly combine candidate regression procedures to improve estimation accuraccy. Applications of these methods in many examples are very succeesful, pointing to the great potential of combining procedures. A fundamental question regarding combining procedure is: What is the potential gain and how much one needs to pay for it?
An Upper Bound for Aggregating Algorithm for Regression with Changing Dependencies
Lecture Notes in Computer Science, 2016
The paper presents a competitive prediction-style upper bound on the square loss of the Aggregating Algorithm for Regression with Changing Dependencies in the linear case. The algorithm is able to compete with a sequence of linear predictors provided the sum of squared Euclidean norms of differences of regression coefficient vectors grows at a sublinear rate.
Aggregation algorithms for neural network ensemble construction
VII Brazilian Symposium on Neural Networks, 2002. SBRN 2002. Proceedings., 2002
How to generate and aggregate base learners to have optimal ensemble generalization capabilities is an important questions in building composite regression/classification machines. We present here an evaluation of several algorithms for artificial neural networks aggregation in the regression settings, including new proposals and comparing them with standard methods in the literature. We also discuss a potential problem with sequential algorithms: the non-frequent but damaging selection through their heuristics of particularly bad ensemble members. We show that one can cope with this problem by allowing individual weighting of aggregate members. Our algorithms and their weighted modifications are favorably tested against other methods in the literature, producing a performance improvement on the standard statistical databases used as benchmarks.
Combining least-squares regressions: an upper-bound on mean-squared error
2005
For Gaussian regression, we develop and analyse methods for combining estimators from various models. For squared-error loss, an unbiased estimator of the risk of a mixture of general estimators is developed. Special attention is given to the case that the components are least-squares projections into arbitrary linear subspaces. We relate the unbiased risk estimate for the mixture estimator to estimates of the risks achieved by the components. This results in accurate bounds on the risk and its unbiased estimate-without advance knowledge of which model is best, the resulting performance is comparable to what is achieved by the best of the individual models.
A general procedure to combine estimators
Computational Statistics & Data Analysis, 2016
A general method to combine several estimators of the same quantity is investigated. In the spirit of model and forecast averaging, the final estimator is computed as a weighted average of the initial ones, where the weights are constrained to sum to one. In this framework, the optimal weights, minimizing the quadratic loss, are entirely determined by the mean square error matrix of the vector of initial estimators. The averaging estimator is built using an estimation of this matrix, which can be computed from the same dataset. A non-asymptotic error bound on the averaging estimator is derived, leading to asymptotic optimality under mild conditions on the estimated mean square error matrix. This method is illustrated on standard statistical problems in parametric and semi-parametric models where the averaging estimator outperforms the initial estimators in most cases.
Aggregation for Gaussian regression
The Annals of Statistics, 2007
This paper studies statistical aggregation procedures in the regression setting. A motivating factor is the existence of many different methods of estimation, leading to possibly competing estimators. We consider here three different types of aggregation: model selection (MS) aggregation, convex (C) aggregation and linear (L) aggregation. The objective of (MS) is to select the optimal single estimator from the list; that of (C) is to select the optimal convex combination of the given estimators; and that of (L) is to select the optimal linear combination of the given estimators. We are interested in evaluating the rates of convergence of the excess risks of the estimators obtained by these procedures. Our approach is motivated by recently published minimax results [Nemirovski, A. (2000). Topics in non-parametric statistics.