A conditional one-output likelihood formulation for multitask Gaussian processes (original) (raw)
Related papers
Bayesian Online Multitask Learning of Gaussian Processes
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000
Standard single-task kernel methods have been recently extended to the case of multi-task learning in the context of regularization theory. There are experimental results, especially in biomedicine, showing the benefit of the multi-task approach compared to the single-task one. However, a possible drawback is computational complexity. For instance, when regularization networks are used, complexity scales as the cube of the overall number of training data, which may be large when several tasks are involved. The aim of this paper is to derive an efficient computational scheme for an important class of multitask kernels. More precisely, a quadratic loss is assumed and each task consists of the sum of a common term and a task-specific one. Within a Bayesian setting, a recursive on-line algorithm is obtained, that updates both estimates and confidence intervals as new data become available. The algorithm is tested on two simulated problems and a real dataset relative to xenobiotics administration in human patients.
Multiple output gaussian process regression
2005
Gaussian processes are usually parameterised in terms of their covariance functions. However, this makes it difficult to deal with multiple outputs, because ensuring that the covariance matrix is positive definite is problematic. An alternative formulation is to treat Gaussian processes as white noise sources convolved with smoothing kernels, and to parameterise the kernel instead. Using this, we extend Gaussian processes to handle multiple, coupled outputs. 1
Approximate Inference in Related Multi-output Gaussian Process Regression
Lecture Notes in Computer Science, 2017
In Gaussian Processes a multi-output kernel is a covariance function over correlated outputs. Using a prior known relation between outputs, joint auto-and cross-covariance functions can be constructed. Realizations from these joint-covariance functions give outputs that are consistent with the prior relation. One issue with gaussian process regression is efficient inference when scaling upto large datasets. In this paper we use approximate inference techniques upon multi-output kernels enforcing relationships between outputs. Results of the proposed methodology for theoretical data and real world applications are presented. The main contribution of this paper is the application and validation of our methodology on a dataset of real aircraft flight tests, while imposing knowledge of aircraft physics into the model.
Efficient Marginal Likelihood Computation for Gaussian Process Regression
2011
In a Bayesian learning setting, the posterior distribution of a predictive model arises from a trade-off between its prior distribution and the conditional likelihood of observed data. Such distribution functions usually rely on additional hyperparameters which need to be tuned in order to achieve optimum predictive performance; this operation can be efficiently performed in an Empirical Bayes fashion by maximizing the posterior marginal likelihood of the observed data. Since the score function of this optimization problem is in general characterized by the presence of local optima, it is necessary to resort to global optimization strategies, which require a large number of function evaluations. Given that the evaluation is usually computationally intensive and badly scaled with respect to the dataset size, the maximum number of observations that can be treated simultaneously is quite limited. In this paper, we consider the case of hyperparameter tuning in Gaussian process regression. A straightforward implementation of the posterior log-likelihood for this model requires O(N^3) operations for every iteration of the optimization procedure, where N is the number of examples in the input dataset. We derive a novel set of identities that allow, after an initial overhead of O(N^3), the evaluation of the score function, as well as the Jacobian and Hessian matrices, in O(N) operations. We prove how the proposed identities, that follow from the eigendecomposition of the kernel matrix, yield a reduction of several orders of magnitude in the computation time for the hyperparameter optimization problem. Notably, the proposed solution provides computational advantages even with respect to state of the art approximations that rely on sparse kernel matrices.
Gaussian process regression with functional covariates and multivariate response
Chemometrics and Intelligent Laboratory Systems, 2017
Gaussian process regression (GPR) has been shown to be a powerful and effective nonparametric method for regression, classification and interpolation, due to many of its desirable properties. However, most GPR models consider univariate or multivariate covariates only. In this paper we extend the GPR models to cases where the covariates include both functional and multivariate variables and the response is multidimensional. The model naturally incorporates two different types of covariates: multivariate and functional, and the principal component analysis is used to de-correlate the multivariate response which avoids the widely recognised difficulty in the multi-output GPR models of formulating covariance functions which have to describe the correlations not only between data points but also between responses. The usefulness of the proposed method is demonstrated through a simulated example and two real data sets in chemometrics.
Hierarchical gaussian process regression
2010
We address an approximation method for Gaussian process (GP) regression, where we approximate covariance by a block matrix such that diagonal blocks are calculated exactly while off-diagonal blocks are approximated. Partitioning input data points, we present a two-layer hierarchical model for GP regression, where prototypes of clusters in the upper layer are involved for coarse modeling by a GP and data points in each cluster in the lower layer are involved for fine modeling by an individual GP whose prior mean is given by the corresponding prototype and covariance is parameterized by data points in the partition. In this hierarchical model, integrating out latent variables in the upper layer leads to a block covariance matrix, where diagonal blocks contain similarities between data points in the same partition and off-diagonal blocks consist of approximate similarities calculated using prototypes. This particular structure of the covariance matrix divides the full GP into a pieces of manageable sub-problems whose complexity scales with the number of data points in a partition. In addition, our hierarchical GP regression (HGPR) is also useful for cases where partitions of data reveal different characteristics. Experiments on several benchmark datasets confirm the useful behavior of our method.
Scalable High-Order Gaussian Process Regression
2019
While most Gaussian processes (GP) work focus on learning single-output functions, many applications, such as physical simulations and gene expressions prediction, require estimations of functions with many outputs. The number of outputs can be much larger than or comparable to the size of training samples. Existing multi-output GP models either are limited to low-dimensional outputs and restricted kernel choices, or assume oversimplified low-rank structures within the outputs. To address these issues, we propose HOGPR, a High-Order Gaussian Process Regression model, which can flexibly capture complex correlations among the outputs and scale up to a large number of outputs. Specifically, we tensorize the high-dimensional outputs, introducing latent coordinate features to index each tensor element (i.e., output) and to capture their correlations. We then generalize a multilinear model to a hybrid of a GP and latent GP model. The model is endowed with a Kronecker product structure ove...
Computer Physics Communications, 2022
We present an implementation for the RS-HDMR-GPR (Random Sampling High Dimensional Model Representation Gaussian Process Regression) method. The method builds representations of multivariate functions with lower-dimensional terms, either as an expansion over orders of coupling or using terms of only a given dimensionality. This facilitates, in particular, recovering functional dependence from sparse data. The method also allows for imputation of missing values of the variables and for a significant pruning of the useful number of HDMR terms. It can also be used for estimating relative importance of different combinations of input variables, thereby adding an element of insight to a general machine learning method, in a way that can be viewed as extending the automatic relevance determination approach. The capabilities of the method and of the associated Python software tool are demonstrated on test cases involving synthetic analytic
Learning curves for multi-task Gaussian process regression
We study the average case performance of multi-task Gaussian process (GP) regression as captured in the learning curve, i.e. the average Bayes error for a chosen task versus the total number of examples nnn for all tasks. For GP covariances that are the product of an input-dependent covariance function and a free-form inter-task covariance matrix, we show that accurate approximations for the learning curve can be obtained for an arbitrary number of tasks TTT. We use these to study the asymptotic learning behaviour for large nnn. Surprisingly, multi-task learning can be asymptotically essentially useless, in the sense that examples from other tasks help only when the degree of inter-task correlation, rho\rhorho, is near its maximal value rho=1\rho=1rho=1. This effect is most extreme for learning of smooth target functions as described by e.g. squared exponential kernels. We also demonstrate that when learning many tasks, the learning curves separate into an initial phase, where the Bayes error o...
Hierarchical Gaussian process mixtures for regression
2005
As a result of their good performance in practice and their desirable analytical properties, Gaussian process regression models are becoming increasingly of interest in statistics, engineering and other fields. However, two major problems arise when the model is applied to a large data-set with repeated measurements. One stems from the systematic heterogeneity among the different replications, and the other is the requirement to invert a covariance matrix which is involved in the implementation of the model.