Reduced rank kernel ridge regression (original) (raw)

Two New Kernel Least Squares Based Methods for Regression

2006

Kernel Ridge Regression (KRR) and the Kernel Aggregating Algorithm for Regression (KAAR) are existing regression methods based on Least Squares. KRR is a well established regression technique, while KAAR is the result of relatively recent work. KAAR is similar to KRR but with some extra regularisation that makes it predict better when the data is heavily corrupted by noise. In the general case, however, this extra regularisation is excessive and therefore KRR performs better. In this paper, two new methods for regression, Iterative KAAR (IKAAR) and Controlled KAAR (CKAAR) are introduced. IKAAR and CKAAR make it possible to control the amount of extra regularisation or to remove it completely, which makes them generalisations of both KRR and KAAR. Some properties of these new methods are proved and their predictive performance on both synthetic and real-world datasets (including the well known Boston Housing dataset) is compared to that of KRR and that of KAAR. Empirical results that have been checked for statistical significance suggest that in general both IKAAR and CKAAR make predictions that are equivalent or better than those of KRR and KAAR. 1 Introduction 2 Background 2.1 Ridge Regression 2.2 The Aggregating Algorithm for Regression 2.3 Kernel Methods 3 Methods 3.1 Motivation and Introduction 3.2 Iterative KAAR 3.3 Controlled KAAR 4 Experimental Results 4.1 Method 4.

Ridge Regression Learning Algorithm in Dual Variables

In this paper we study a dual version of the Ridge Regression procedure. It allows us to perform non-linear regression by constructing a linear regression function in a high dimensional feature space. The feature space representation can result in a large increase in the number of parameters used by the algorithm. In order to combat this \curse of dimensionality", the algorithm allows the use of kernel functions, as used in Support Vector methods. We also discuss a powerful family of kernel functions which is constructed using the ANOVA decomposition method from the kernel corresponding to splines with an innite number of nodes. This paper introduces a regression estimation algorithm which is a combination of these two elements: the dual version of Ridge Regression is applied to the ANOVA enhancement of the in nitenode splines. Experimental results are then presented (based on the Boston Housing data set) which indicate the performance of this algorithm relative to other algorithms.

Significant vector learning to construct sparse kernel regression models

Neural Networks, 2007

A novel significant vector (SV) regression algorithm is proposed in the paper based on the analysis on Chen's orthogonal least squares (OLS) regression algorithm. The proposed regularized SV algorithm finds the significant vectors in a successive greedy process in which, compared to the classical OLS algorithm, the orthogonalization has been removed from the algorithm. The performance of the proposed algorithm is comparable to the OLS algorithm while it saves a lot of time complexities in implementing orthogonalization needed in the OLS algorithm.

Heteroscedastic kernel ridge regression

Neurocomputing, 2004

In this paper we extend a form of kernel ridge regression (KRR) for data characterised by a heteroscedastic (i.e. input dependent variance) Gaussian noise process, introduced in Foxall et al. (in: It is shown that the proposed heteroscedastic kernel ridge regression model can give a more accurate estimate of the conditional mean of the target distribution than conventional KRR and also provides an indication of the spread of the target distribution (i.e. predictive error bars). The leave-one-out cross-validation estimate of the conditional mean is used in ÿtting the model of the conditional variance in order to overcome the inherent bias in maximum likelihood estimates of the variance. The beneÿts of the proposed model are demonstrated on synthetic and real-world benchmark data sets and for the task of predicting episodes of poor air quality in an urban environment.

Fast Randomized Kernel Ridge Regression with Statistical Guarantees

2015

One approach to improving the running time of kernel-based methods is to build a small sketch of the kernel matrix and use it in lieu of the full matrix in the machine learning task of interest. Here, we describe a version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance. By extending the notion of statistical leverage scores to the setting of kernel ridge regression, we are able to identify a sampling distribution that reduces the size of the sketch (i.e., the required number of columns to be sampled) to the effective dimensionality of the problem. This latter quantity is often much smaller than previous bounds that depend on the maximal degrees of freedom. We give an empirical evidence supporting this fact. Our second contribution is to present a fast algorithm to quickly compute coarse approximations to these scores in time linear in the number of samples. More precisely, the running time of the algorithm is O(...

Sparse Kernel Regressors

Lecture Notes in Computer Science, 2001

Sparse kernel regressors have become popular by applying the support vector method to regression problems. Although this approach has been shown to exhibit excellent generalization properties in many experiments, it suffers from several drawbacks: the absence of probabilistic outputs, the restriction to Mercer kernels, and the steep growth of the number of support vectors with increasing size of the training set. In this paper we present a new class of kernel regressors that effectively overcome the above problems. We call this new approach generalized LASSO regression. It has a clear probabilistic interpretation, produces extremely sparse solutions, can handle learning sets that are corrupted by outliers, and is capable of dealing with large-scale problems.

Sparse Kernel Regression Modelling Based on L1 Significant Vector Learning

2005 International Conference on Neural Networks and Brain, 2005

A novel L1 significant vector (SV) regression algorithm is proposed in the paper. The proposed regularized L1 SV algorithm finds the significant vectors in a successive greedy process. The performance of the proposed algorithm is comparable to the OLS algorithm while it saves a lot of time complexities in implementing orthogonalization needed in the OLS algorithm.

A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression

2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018

We present memory-efficient and scalable algorithms for kernel methods used in machine learning. Using hierarchical matrix approximations for the kernel matrix the memory requirements, the number of floating point operations, and the execution time are drastically reduced compared to standard dense linear algebra routines. We consider both the general H matrix hierarchical format as well as Hierarchically Semi-Separable (HSS) matrices. Furthermore, we investigate the impact of several preprocessing and clustering techniques on the hierarchical matrix compression. Effective clustering of the input leads to a tenfold increase in efficiency of the compression. The algorithms are implemented using the STRUMPACK solver library. These results confirm that-with correct tuning of the hyperparameters-classification using kernel ridge regression with the compressed matrix does not lose prediction accuracy compared to the exact-not compressed-kernel matrix and that our approach can be extended to O(1M) datasets, for which computation with the full kernel matrix becomes prohibitively expensive. We present numerical experiments in a distributed memory environment up to 1,024 processors of the NERSC's Cori supercomputer using wellknown datasets to the machine learning community that range from dimension 8 up to 784.

Enhanced ridge regressions

Mathematical and Computer Modelling, 2010

With a simple transformation, the ordinary least squares objective can yield a family of modified ridge regressions which outperforms the regular ridge model. These models have more stable coefficients and a higher quality of fit with the growing profile parameter.

Improved sparse least-squares support vector machines

Neurocomputing, 2002

, in press) describe a weighted least-squares formulation of the support vector machine for regression problems and present a simple algorithm for sparse approximation of the typically fully dense kernel expansions obtained using this method. In this paper, we present an improved method for achieving sparsity in least-squares support vector machines, which takes into account the residuals for all training patterns, rather than only those incorporated in the sparse kernel expansion. The superiority of this algorithm is demonstrated on the motorcycle and Boston housing data sets.