Variational Inducing Kernels for Sparse Convolved Multiple Output Gaussian Processes (original) (raw)

Efficient multioutput Gaussian processes through variational inducing kernels

2011

Interest in multioutput kernel methods is increasing, whether under the guise of multitask learning, multisensor networks or structured output data. From the Gaussian process perspective a multioutput Mercer kernel is a covariance function over correlated output functions. One way of constructing such kernels is based on convolution processes (CP). A key problem for this approach is efficient inference.Álvarez and Lawrence recently presented a sparse approximation for CPs that enabled efficient inference. In this paper, we extend this work in two directions: we introduce the concept of variational inducing functions to handle potential non-smooth functions involved in the kernel CP construction and we consider an alternative approach to approximate inference based on variational methods, extending the work by Titsias to the multiple output case. We demonstrate our approaches on prediction of school marks, compiler performance and financial time series.

Sparse convolved multiple output gaussian processes

2009

Recently there has been an increasing interest in methods that deal with multiple outputs. This has been motivated partly by frameworks like multitask learning, multisensor networks or structured output data. From a Gaussian processes perspective, the problem reduces to specifying an appropriate covariance function that, whilst being positive semi-definite, captures the dependencies between all the data points and across all the outputs. One approach to account for non-trivial correlations between outputs employs convolution processes. Under a latent function interpretation of the convolution transform we establish dependencies between output variables. The main drawbacks of this approach are the associated computational and storage demands. In this paper we address these issues. We present different sparse approximations for dependent output Gaussian processes constructed through the convolution formalism. We exploit the conditional independencies present naturally in the model. This leads to a form of the covariance similar in spirit to the so called PITC and FITC approximations for a single output. We show experimental results with synthetic and real data, in particular, we show results in pollution prediction, school exams score prediction and gene expression data.

Computationally Efficient Convolved Multiple Output Gaussian Processes

2020

Recently there has been an increasing interest in methods that deal with multiple outputs. This has been motivated partly by frameworks like multitask learning, multisensor networks or structured output data. From a Gaussian processes perspective, the problem reduces to specifying an appropriate covariance function that, whilst being positive semi-definite, captures the dependencies between all the data points and across all the outputs. One approach to account for non-trivial correlations between outputs employs convolution processes. Under a latent function interpretation of the convolution transform we establish dependencies between output variables. The main drawbacks of this approach are the associated computational and storage demands. In this paper we address these issues. We present different sparse approximations for dependent output Gaussian processes constructed through the convolution formalism. We exploit the conditional independencies present naturally in the model. Th...

Sparse Gaussian Processes via Parametric Families of Compactly-supported Kernels

ArXiv, 2020

Gaussian processes are powerful models for probabilistic machine learning, but are limited in application by their O(N3)O(N^3)O(N3) inference complexity. We propose a method for deriving parametric families of kernel functions with compact spatial support, which yield naturally sparse kernel matrices and enable fast Gaussian process inference via sparse linear algebra. These families generalize known compactly-supported kernel functions, such as the Wendland polynomials. The parameters of this family of kernels can be learned from data using maximum likelihood estimation. Alternatively, we can quickly compute compact approximations of a target kernel using convex optimization. We demonstrate that these approximations incur minimal error over the exact models when modeling data drawn directly from a target GP, and can out-perform the traditional GP kernels on real-world signal reconstruction tasks, while exhibiting sub-quadratic inference complexity.

Deep Gaussian Processes with Convolutional Kernels

ArXiv, 2018

Deep Gaussian processes (DGPs) provide a Bayesian non-parametric alternative to standard parametric deep learning models. A DGP is formed by stacking multiple GPs resulting in a well-regularized composition of functions. The Bayesian framework that equips the model with attractive properties, such as implicit capacity control and predictive uncertainty, makes it at the same time challenging to combine with a convolutional structure. This has hindered the application of DGPs in computer vision tasks, an area where deep parametric models (i.e. CNNs) have made breakthroughs. Standard kernels used in DGPs such as radial basis functions (RBFs) are insufficient for handling pixel variability in raw images. In this paper, we build on the recent convolutional GP to develop Convolutional DGP (CDGP) models which effectively capture image level features through the use of convolution kernels, therefore opening up the way for applying DGPs to computer vision tasks. Our model learns local spatia...

Structured Gaussian Processes with Twin Multiple Kernel Learning

2018

Vanilla Gaussian processes (GPs) have prohibitive computational needs for very large data sets. To overcome this difficulty, special structures in the covariance matrix, if exist, should be exploited using decomposition methods such as the Kronecker product. In this paper, we integrated the Kronecker decomposition approach into a multiple kernel learning (MKL) framework for GP regression. We first formulated a regression algorithm with the Kronecker decomposition of structured kernels for spatiotemporal modeling to learn the contribution of spatial and temporal features as well as learning a model for out-of-sample prediction. We then evaluated the performance of our proposed computational framework, namely, structured GPs with twin MKL, on two different real data sets to show its efficiency and effectiveness. MKL helped us extract relative importance of input features by assigning weights to kernels calculated on different subsets of temporal and spatial features.

Variational learning of inducing variables in sparse Gaussian processes

2009

Sparse Gaussian process methods that use inducing variables require the selection of the inducing inputs and the kernel hyperparameters. We introduce a variational formulation for sparse approximations that jointly infers the inducing inputs and the kernel hyperparameters by maximizing a lower bound of the true log marginal likelihood. The key property of this formulation is that the inducing inputs are defined to be variational parameters which are selected by minimizing the Kullback-Leibler divergence between the variational distribution and the exact posterior distribution over the latent function values. We apply this technique to regression and we compare it with other approaches in the literature.

M L ] 5 J un 2 01 8 Deep Gaussian Processes with Convolutional Kernels

2018

Deep Gaussian processes (DGPs) provide a Bayesian non-parametric alternative to standard parametric deep learning models. A DGP is formed by stacking multiple GPs resulting in a well-regularized composition of functions. The Bayesian framework that equips the model with attractive properties, such as implicit capacity control and predictive uncertainty, makes it at the same time challenging to combine with a convolutional structure. This has hindered the application of DGPs in computer vision tasks, an area where deep parametric models (i.e. CNNs) have made breakthroughs. Standard kernels used in DGPs such as radial basis functions (RBFs) are insufficient for handling pixel variability in raw images. In this paper, we build on the recent convolutional GP to develop Convolutional DGP (CDGP) models which effectively capture image level features through the use of convolution kernels, therefore opening up the way for applying DGPs to computer vision tasks. Our model learns local spatia...

Online Sparse Multi-Output Gaussian Process Regression and Learning

IEEE Transactions on Signal and Information Processing over Networks

This paper proposes an approach for online training of a sparse multi-output Gaussian process (GP) model using sequentially obtained data. The considered model combines linearly multiple latent sparse GPs to produce correlated output variables. Each latent GP has its own set of inducing points to achieve sparsity. We show that given the model hyperparameters, the posterior over the inducing points is Gaussian under Gaussian noise since they are linearly related to the model outputs. However, the inducing points from different latent GPs would become correlated, leading to a full covariance matrix cumbersome to handle. Variational inference is thus applied and an approximate regression technique is obtained, with which the posteriors over different inducing point sets can always factorize. As the model outputs are non-linearly dependent on the hyperparameters, a novel marginalized particle filer (MPF)-based algorithm is proposed for the online inference of the inducing point values and hyperparameters. The approximate regression technique is incorporated in the MPF and its distributed realization is presented. Algorithm validation using synthetic and real data is conducted, and promising results are obtained.

Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning

2011

We introduce a variational Bayesian inference algorithm which can be widely applied to sparse linear models. The algorithm is based on the spike and slab prior which, from a Bayesian perspective, is the golden standard for sparse inference. We apply the method to a general multi-task and multiple kernel learning model in which a common set of Gaussian process functions is linearly combined with task-specific sparse weights, thus inducing relation between tasks. This model unifies several sparse linear models, such as generalized linear models, sparse factor analysis and matrix factorization with missing values, so that the variational algorithm can be applied to all these cases. We demonstrate our approach in multioutput Gaussian process regression, multi-class classification, image processing applications and collaborative filtering.