Sparse Information Filter for Fast Gaussian Process Regression (original) (raw)
Related papers
Efficient Optimization for Sparse Gaussian Process Regression
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015
We propose an efficient optimization algorithm for selecting a subset of training data to induce sparsity for Gaussian process regression. The algorithm estimates an inducing set and the hyperparameters using a single objective, either the marginal likelihood or a variational free energy. The space and time complexity are linear in training set size, and the algorithm can be applied to large regression problems on discrete or continuous domains. Empirical evaluation shows state-ofart performance in discrete cases and competitive results in the continuous case.
A Greedy approximation scheme for Sparse Gaussian process regression
2018
In their standard form Gaussian processes (GPs) provide a powerful non-parametric framework for regression and classificaton tasks. Their one limiting property is their mathcalO(N3)\mathcal{O}(N^{3})mathcalO(N3) scaling where NNN is the number of training data points. In this paper we present a framework for GP training with sequential selection of training data points using an intuitive selection metric. The greedy forward selection strategy is devised to target two factors - regions of high predictive uncertainty and underfit. Under this technique the complexity of GP training is reduced to mathcalO(M3)\mathcal{O}(M^{3})mathcalO(M3) where (MllN)(M \ll N)(MllN) if MMM data points (out of NNN) are eventually selected. The sequential nature of the algorithm circumvents the need to invert the covariance matrix of dimension NtimesNN \times NNtimesN and enables the use of favourable matrix inverse update identities. We outline the algorithm and sequential updates to the posterior mean and variance. We demonstrate our method on selected one dimensional...
Recursive estimation for sparse Gaussian process regression
Automatica, 2020
Gaussian Processes (GPs) are powerful kernelized methods for non-parameteric regression used in many applications. However, their use is limited to a few thousand of training samples due to their cubic time complexity. In order to scale GPs to larger datasets, several sparse approximations based on so-called inducing points have been proposed in the literature. In this work we investigate the connection between a general class of sparse inducing point GP regression methods and Bayesian recursive estimation which enables Kalman Filter like updating for online learning. The majority of previous work has focused on the batch setting, in particular for learning the model parameters and the position of the inducing points, here instead we focus on training with mini-batches. By exploiting the Kalman filter formulation, we propose a novel approach that estimates such parameters by recursively propagating the analytical gradients of the posterior over mini-batches of the data. Compared to state of the art methods, our method keeps analytic updates for the mean and covariance of the posterior, thus reducing drastically the size of the optimization problem. We show that our method achieves faster convergence and superior performance compared to state of the art sequential Gaussian Process regression on synthetic GP as well as real-world data with up to a million of data samples.
Correlated Product of Experts for Sparse Gaussian Process Regression
Cornell University - arXiv, 2021
Gaussian processes (GPs) are an important tool in machine learning and statistics with applications ranging from social and natural science through engineering. They constitute a powerful kernelized non-parametric method with well-calibrated uncertainty estimates, however, off-the-shelf GP inference procedures are limited to datasets with several thousand data points because of their cubic computational complexity. For this reason, many sparse GPs techniques have been developed over the past years. In this paper, we focus on GP regression tasks and propose a new approach based on aggregating predictions from several local and correlated experts. Thereby, the degree of correlation between the experts can vary between independent up to fully correlated experts. The individual predictions of the experts are aggregated taking into account their correlation resulting in consistent uncertainty estimates. Our method recovers independent Product of Experts, sparse GP and full GP in the limiting cases. The presented framework can deal with a general kernel function and multiple variables, and has a time and space complexity which is linear in the number of experts and data samples, which makes our approach highly scalable. We demonstrate superior performance, in a time vs. accuracy sense, of our proposed method against stateof-the-art GP approximation methods for synthetic as well as several real-world datasets with deterministic and stochastic optimization.
Scalable Variational Bayesian Kernel Selection for Sparse Gaussian Process Regression
Proceedings of the AAAI Conference on Artificial Intelligence, 2020
This paper presents a variational Bayesian kernel selection (VBKS) algorithm for sparse Gaussian process regression (SGPR) models. In contrast to existing GP kernel selection algorithms that aim to select only one kernel with the highest model evidence, our VBKS algorithm considers the kernel as a random variable and learns its belief from data such that the uncertainty of the kernel can be interpreted and exploited to avoid overconfident GP predictions. To achieve this, we represent the probabilistic kernel as an additional variational variable in a variational inference (VI) framework for SGPR models where its posterior belief is learned together with that of the other variational variables (i.e., inducing variables and kernel hyperparameters). In particular, we transform the discrete kernel belief into a continuous parametric distribution via reparameterization in order to apply VI. Though it is computationally challenging to jointly optimize a large number of hyperparameters due...
A Tutorial on Sparse Gaussian Processes and Variational Inference
arXiv (Cornell University), 2020
Gaussian processes (GPs) provide a mathematically elegant framework for Bayesian inference and they can offer principled uncertainty estimates for a large range of problems. For example, if we consider certain regression problems with Gaussian likelihoods, a GP model enjoys a posterior in closed form. However, identifying the posterior GP scales cubically with the number of training examples and furthermore requires to store all training examples in memory. In order to overcome these practical obstacles, sparse GPs have been proposed that approximate the true posterior GP with a set of pseudo-training examples (a.k.a. inducing inputs or inducing points). Importantly, the number of pseudo-training examples is user-defined and enables control over computational and memory complexity. In the general case, sparse GPs do not enjoy closed-form solutions and one has to resort to approximate inference. In this context, a convenient choice for approximate inference is variational inference (VI), where the problem of Bayesian inference is cast as an optimization problem-namely, to maximize a lower bound of the logarithm of the marginal likelihood. This paves the way for a powerful and versatile framework, where pseudo-training examples are treated as optimization arguments of the approximate posterior that are jointly identified together with hyperparameters of the generative model (i.e. prior and likelihood) in the course of training. The framework can naturally handle a wide scope of supervised learning problems, ranging from regression with heteroscedastic and non-Gaussian likelihoods to classification problems with discrete labels, but also problems where the regression or classification targets are multidimensional. The purpose of this tutorial is to provide access to the basic matter for readers without prior knowledge in both GPs and VI. It turns out that a proper exposition to the subject enables also convenient access to more recent advances in the field of GPs (like importanceweighted VI as well as interdomain, multioutput and deep GPs) that can serve as an inspiration for exploring new research ideas.
Adaptive Sparse Gaussian Process
arXiv (Cornell University), 2023
Adaptive learning is necessary for non-stationary environments where the learning machine needs to forget past data distribution. Efficient algorithms require a compact model update to not grow in computational burden with the incoming data and with the lowest possible computational cost for online parameter updating. Existing solutions only partially cover these needs. Here, we propose the first adaptive sparse Gaussian Process (GP) able to address all these issues. We first reformulate a variational sparse GP algorithm to make it adaptive through a forgetting factor. Next, to make the model inference as simple as possible, we propose updating a single inducing point of the sparse GP model together with the remaining model parameters every time a new sample arrives. As a result, the algorithm presents a fast convergence of the inference process, which allows an efficient model update (with a single inference iteration) even in highly nonstationary environments. Experimental results demonstrate the capabilities of the proposed algorithm and its good performance in modeling the predictive posterior in mean and confidence interval estimation compared to state-of-the-art approaches.
Variational learning of inducing variables in sparse Gaussian processes
2009
Sparse Gaussian process methods that use inducing variables require the selection of the inducing inputs and the kernel hyperparameters. We introduce a variational formulation for sparse approximations that jointly infers the inducing inputs and the kernel hyperparameters by maximizing a lower bound of the true log marginal likelihood. The key property of this formulation is that the inducing inputs are defined to be variational parameters which are selected by minimizing the Kullback-Leibler divergence between the variational distribution and the exact posterior distribution over the latent function values. We apply this technique to regression and we compare it with other approaches in the literature.
Fast near-GRID Gaussian process regression
Gaussian process regression (GPR) is a powerful non-linear technique for Bayesian inference and prediction. One drawback is its O(N 3 ) computational complexity for both prediction and hyperparameter estimation for N input points which has led to much work in sparse GPR methods. In case that the covariance function is expressible as a tensor product kernel (TPK) and the inputs form a multidimensional grid, it was shown that the costs for exact GPR can be reduced to a sub-quadratic function of N . We extend these exact fast algorithms to sparse GPR and remark on a connection to Gaussian process latent variable models (GPLVMs). In practice, the inputs may also violate the multidimensional grid constraints so we pose and efficiently solve missing and extra data problems for both exact and sparse grid GPR. We demonstrate our method on synthetic, text scan, and magnetic resonance imaging (MRI) data reconstructions.
Scalable GAM using sparse variational Gaussian processes
ArXiv, 2018
Generalized additive models (GAMs) are a widely used class of models of interest to statisticians as they provide a flexible way to design interpretable models of data beyond linear models. We here propose a scalable and well-calibrated Bayesian treatment of GAMs using Gaussian processes (GPs) and leveraging recent advances in variational inference. We use sparse GPs to represent each component and exploit the additive structure of the model to efficiently represent a Gaussian a posteriori coupling between the components.