Filtered Gaussian Processes for Learning with Large Data-Sets (original) (raw)

Fast Gaussian Process Regression for Big Data

Big Data Research

Gaussian Processes are widely used for regression tasks. A known limitation in the application of Gaussian Processes to regression tasks is that the computation of the solution requires performing a matrix inversion. The solution also requires the storage of a large matrix in memory. These factors restrict the application of Gaussian Process regression to small and moderate size data sets. We present an algorithm that combines estimates from models developed using subsets of the data obtained in a manner similar to the bootstrap. The sample size is a critical parameter for this algorithm. Guidelines for reasonable choices of algorithm parameters, based on detailed experimental study, are provided. Various techniques have been proposed to scale Gaussian Processes to large scale regression tasks. The most appropriate choice depends on the problem context. The proposed method is most appropriate for problems where an additive model works well and the response depends on a small number of features. The minimax rate of convergence for such problems is attractive and we can build effective models with a small subset of the data. The Stochastic Variational Gaussian Process and the Sparse Gaussian Process are also appropriate choices for such problems. These methods pick a subset of data based on theoretical considerations. The proposed algorithm uses bagging and random sampling. Results from experiments conducted as part of this study indicate that the algorithm presented in this work can be as effective as these methods.

Efficient Gaussian process regression for large datasets

Biometrika, 2013

Gaussian processes are widely used in nonparametric regression, classification and spatiotemporal modelling, facilitated in part by a rich literature on their theoretical properties. However, one of their practical limitations is expensive computation, typically on the order of n 3 where n is the number of data points, in performing the necessary matrix inversions. For large datasets, storage and processing also lead to computational bottlenecks, and numerical stability of the estimates and predicted values degrades with increasing n. Various methods have been proposed to address these problems, including predictive processes in spatial data analysis and the subset-ofregressors technique in machine learning. The idea underlying these approaches is to use a subset of the data, but this raises questions concerning sensitivity to the choice of subset and limitations in estimating fine-scale structure in regions that are not well covered by the subset. Motivated by the literature on compressive sensing, we propose an alternative approach that involves linear projection of all the data points onto a lower-dimensional subspace. We demonstrate the superiority of this approach from a theoretical perspective and through simulated and real data examples.

A Greedy approximation scheme for Sparse Gaussian process regression

2018

In their standard form Gaussian processes (GPs) provide a powerful non-parametric framework for regression and classificaton tasks. Their one limiting property is their mathcalO(N3)\mathcal{O}(N^{3})mathcalO(N3) scaling where NNN is the number of training data points. In this paper we present a framework for GP training with sequential selection of training data points using an intuitive selection metric. The greedy forward selection strategy is devised to target two factors - regions of high predictive uncertainty and underfit. Under this technique the complexity of GP training is reduced to mathcalO(M3)\mathcal{O}(M^{3})mathcalO(M3) where (MllN)(M \ll N)(MllN) if MMM data points (out of NNN) are eventually selected. The sequential nature of the algorithm circumvents the need to invert the covariance matrix of dimension NtimesNN \times NNtimesN and enables the use of favourable matrix inverse update identities. We outline the algorithm and sequential updates to the posterior mean and variance. We demonstrate our method on selected one dimensional...

Sparse Information Filter for Fast Gaussian Process Regression

Machine Learning and Knowledge Discovery in Databases. Research Track, 2021

Gaussian processes (GPs) are an important tool in machine learning and applied mathematics with applications ranging from Bayesian optimization to calibration of computer experiments. They constitute a powerful kernelized non-parametric method with well-calibrated uncertainty estimates, however, off-the-shelf GP inference procedures are limited to datasets with a few thousand data points because of their cubic computational complexity. For this reason, many sparse GPs techniques were developed over the past years. In this paper, we focus on GP regression tasks and propose a new algorithm to train variational sparse GP models. An analytical posterior update expression based on the Information Filter is derived for the variational sparse GP model. We benchmark our method on several real datasets with millions of data points against the state-of-the-art Stochastic Variational GP (SVGP) and sparse orthogonal variational inference for Gaussian Processes (SOLVEGP). Our method achieves comparable performances to SVGP and SOLVEGP while providing considerable speed-ups. Specifically, it is consistently four times faster than SVGP and on average 2.5 times faster than SOLVEGP.

Learning filters in Gaussian process classification problems

2014 IEEE International Conference on Image Processing (ICIP), 2014

Many real classification tasks are oriented to sequence (neighbor) labeling, that is, assigning a label to every sample of a signal while taking into account the sequentiality (or neighborhood) of the samples. This is normally approached by first filtering the data and then performing classification. In consequence, both processes are optimized separately, with no guarantee of global optimality. In this work we utilize Bayesian modeling and inference to jointly learn a classifier and estimate an optimal filterbank. Variational Bayesian inference is used to approximate the posterior distributions of all unknowns, resulting in an iterative procedure to estimate the classifier parameters and the filterbank coefficients. In the experimental section we show, using synthetic and real data, that the proposed method compares favorably with other classification/filtering approaches, without the need of parameter tuning.

Recursive estimation for sparse Gaussian process regression

Automatica, 2020

Gaussian Processes (GPs) are powerful kernelized methods for non-parameteric regression used in many applications. However, their use is limited to a few thousand of training samples due to their cubic time complexity. In order to scale GPs to larger datasets, several sparse approximations based on so-called inducing points have been proposed in the literature. In this work we investigate the connection between a general class of sparse inducing point GP regression methods and Bayesian recursive estimation which enables Kalman Filter like updating for online learning. The majority of previous work has focused on the batch setting, in particular for learning the model parameters and the position of the inducing points, here instead we focus on training with mini-batches. By exploiting the Kalman filter formulation, we propose a novel approach that estimates such parameters by recursively propagating the analytical gradients of the posterior over mini-batches of the data. Compared to state of the art methods, our method keeps analytic updates for the mean and covariance of the posterior, thus reducing drastically the size of the optimization problem. We show that our method achieves faster convergence and superior performance compared to state of the art sequential Gaussian Process regression on synthetic GP as well as real-world data with up to a million of data samples.

Trading-off Data Fit and Complexity in Training Gaussian Processes with Multiple Kernels

Lecture Notes in Computer Science, 2019

Gaussian processes (GPs) belong to a class of probabilistic techniques that have been successfully used in different domains of machine learning and optimization. They are popular because they provide uncertainties in predictions, which sets them apart from other modelling methods providing only point predictions. The uncertainty is particularly useful for decision making as we can gauge how reliable a prediction is. One of the fundamental challenges in using GPs is that the efficacy of a model is conferred by selecting an appropriate kernel and the associated hyperparameter values for a given problem. Furthermore, the training of GPs, that is optimizing the hyperparameters using a data set is traditionally performed using a cost function that is a weighted sum of data fit and model complexity, and the underlying trade-off is completely ignored. Addressing these challenges and shortcomings, in this article, we propose the following automated training scheme. Firstly, we use a weighted product of multiple kernels with a view to relieve the users from choosing an appropriate kernel for the problem at hand without any domain specific knowledge. Secondly, for the first time, we modify GP training by using a multi-objective optimizer to tune the hyperparameters and weights of multiple kernels and extract an approximation of the complete trade-off front between data-fit and model complexity. We then propose to use a novel solution selection strategy based on mean standardized log loss (MSLL) to select a solution from the estimated trade-off front and finalise training of a GP model. The results on three data sets and comparison with the standard approach clearly show the potential benefit of the proposed approach of using multi-objective optimization with multiple kernels.

Scalable Variational Bayesian Kernel Selection for Sparse Gaussian Process Regression

Proceedings of the AAAI Conference on Artificial Intelligence, 2020

This paper presents a variational Bayesian kernel selection (VBKS) algorithm for sparse Gaussian process regression (SGPR) models. In contrast to existing GP kernel selection algorithms that aim to select only one kernel with the highest model evidence, our VBKS algorithm considers the kernel as a random variable and learns its belief from data such that the uncertainty of the kernel can be interpreted and exploited to avoid overconfident GP predictions. To achieve this, we represent the probabilistic kernel as an additional variational variable in a variational inference (VI) framework for SGPR models where its posterior belief is learned together with that of the other variational variables (i.e., inducing variables and kernel hyperparameters). In particular, we transform the discrete kernel belief into a continuous parametric distribution via reparameterization in order to apply VI. Though it is computationally challenging to jointly optimize a large number of hyperparameters due...

Two-step Gaussian Process Regression Improving Performance of Training and Prediction

Proceedings of the 2018 International Conference on Computer Science, Electronics and Communication Engineering (CSECE 2018), 2018

Since Gaussian process regression (GPR) cannot feasibly be applied to big and growing data sets, this paper introduces an integration algorithm called Two-step Gaussian Process Regression (TGPR) which speeds up both training and prediction to solve the problem. First, analyze the basics behind regular GPR. Then, introduce TGPR by using the inducing inputs to optimize the regular GPR algorithm. Last, apply TGPR to a three-dimension model, the experimental results compared with regular GPR show that TGPR is faster and more accurate.

Large-Scale Gaussian Process Classification with Flexible Adaptive Histogram Kernels

We present how to perform exact large-scale multi-class Gaussian process classification with parameterized histogram intersection kernels. In contrast to previous approaches, we use a full Bayesian model without any sparse approximation techniques, which allows for learning in sub-quadratic and classification in constant time. To handle the additional model flexibility induced by parameterized kernels, our approach is able to optimize the parameters with large-scale training data. A key ingredient of this optimization is a new efficient upper bound of the negative Gaussian process log-likelihood. Experiments with image categorization tasks exhibit high performance gains with flexible kernels as well as learning within a few minutes and classification in microseconds for databases, where exact Gaussian process inference was not possible before.